MVD chaintest report 3-Dec-99
This message (now translated to html) was sent Fri Dec 3 17:36:07 MST 1999. It was sent to phenix-mvd-l@bnl.gov, haggerty@bnl.gov (John Haggerty), bobrek@icsun1.ic.ornl.gov (Miljko Bobrek), young@orph01.phy.ornl.gov (Glenn Young), nagle@nevis1.nevis.columbia.edu (Jamie Nagle), chi@nevis.nevis.columbia.edu (Cheng-Yi Chi)
Hi,
This message goes to the MVD listserver and a few other possibly interested parties. I do not know of the "interested parties" are on the mvd listserver or not. If you get it twice, sorry.
Abstract: The MVD chaintest seems to work (mainly) for at least 100K events. One problem is that the number of triggers sent to the system is a few percent more than the number of events seen in the output file. Perhaps this difference is caused by the limited speed of writting the files to disk. There are parity errors about once per 50K data packets.
relevant webpage: Notes about MVD chaintest
The chaintest consists of
1) 6 MCMs (but I only read out two of them right now)
2) 1 power/communication board
3) 1 motherboard
4) 1 phenix low voltage crate
5) 1 MVD prototype Data collection interface module (DCIM)
6) 1 MVD prototype Timing and Control Interface module (TCIM)
7) 1 homemade (by SangYeol Kim) replacement for the arcnet
interface module
8) 1 9u VME "Interface crate"
9) 1 miniDAQ used as a substitute of a granule timing module
10) 1 6u VME crate
11) 1 Phenix Data collection interface module
12) 1 VME crate controller
13) 1 Phenix Partition module
14) 2 PC's and 1 HP/unix system
15) a variety of computer programs
16) a variety of cables, copper and optical
17) a few NIM modules
18) part of the MVD cooling system
In short, a lot of stuff. There is a sketch of this on the
web page I mentioned at the top of this message.
The data output packets go through a "real" Phenix DAQ chain MCM --> power/comm board --> motherboard --> DCIM --> DCM --> crate controller The crate controller writes the data to a Phenix Raw Data Format (PRDF) file on the disk of the HP/unix system. This disk is what limits the number of events -- the disk gets full after a few 100K events.
The timing information uses only part of the "real" chain. A mini-daq system is setup via labview code on the PC. It sends of the timing information (level-1's, clocks, ...) out on a optical fiber. The minidaq replaces a granule timing module in the "real" system. The rest of the chain is like the real system: minidaq --> TCIM --> motherboard --> power/comm board --> MCM I should mention that the minidaq + TCIM part of the setup is extremely stable. I started in Monday morning and it was still running fine Friday morning without my ever touching it.
The setup arcnet uses less of the real system, since the Arcnet interface module does not yet exist. The arcnet information (programs for FPGAs in the DCIM and MCM; serial control bits for the MCMs, TCIM, DCIM) is sent out of a PC running labwindows to a homemade "arcnet" board which is in the interface module crate. It sends the data out on the "real" DAQ system path: homemade arcnet <--> motherboard <--> power/comm board <-->MCM The arrows go both ways since some of the data can be read back through the chain.
The tests so far all have used the "test" mode in the MCMs. In this mode each MCM sends out ADC data consisting of 1, 2, 3, ... 256 for the 256 ADCs in the MCM. This allows us check the validity of the data received. The tests used "duplex mode" into the DCM -- each fiber into the DCM carries data from two MCMs, controlled via the ENDDAT0 and 1 signals. The trigger (into the minidaq) is a pulser (NIM module) running at rates from a few Hz to a few 10's of Hz. If the rate is too high, the system seems to fall behind the trigger rate -- I assume the limit is writting on the hp/unix disk.
The data is analyzed with a simple program I wrote. It checks the parity bit on each data word, the "vertical parity" word, and about 6 other details of the data packet format.
The longest test so far ran for 202 minutes. The "run" was stopped by me -- it did not crash. About 100K events (each consisting of 2 MCM data packets) were collected. The scaler attached to the pulser said there should be 105520 triggers. The program on the crate controller (which I got from Mickey and Jamie) said it had seen 211040 packets (=2*105520). The PRDF file 205190 contained data packets -- 102710 events from the first MCM and 102467 packets from the 2nd MCM. I do not know if the small (0.2%) difference in packets from the two MCMs is a problem of the DAQ chain or my code. The code also reported 3 events with parity errors (in the "vertical parity") and one event with the detector ID (should be 2 for the MVD) incorrect in the data packet. All other details of the data packets (including the ADC values) were correct for all packets. The first MCM was only got 97.3% of the events to the output file and the second MCM only got 97.1% of the events to the output file. I do not yet know the cause of these discrepencies. In shorter runs of a few thousand events, I did not see such differences. I hypothesize that the differences in the number of triggers and the number of events in the file is related to the speed at which the events can be stored on disk -- but I have not tested this. However, if you look at the table below there seems to be an anti-correlation between the fraction of the events which get into the PRDF file and the event rate.
Here a a few similar statistics from the test described above and a few other tests (actually I divided the number of "events" from the crate controller by two to convert packets to events.):
quantity | Test1 | Test2 | Test3 | Test4 |
triggers (from NIM scaler) | 105520 | 12125 | 45816 | 535 |
events reported by "dcm" program in crate controller | 105520 | 12125 | 45816 | 535 |
MCM1 packets in PRDF file | 102719 | 12097 | 44163 | 535 |
MCM2 packets in PRDF file | 102467 | 12097 | 44047 | 535 |
packets with parity errors, or other format problems | 4 | 0 | 2 | 0 |
length of run (minutes) | 202 | 31 | 89 | 2.5 |
rate: events/sec | 8.7 | 6.5 | 8.6 | 3.5 |