Wir schaffen Wissen – heute für morgen
Paul Scherrer InstitutTimo Korhonen
Improvements to Indexing Tool (Channel Archiver)
EPICS Meeting, BNL 2010
10/11/10EPICS Meeting, BNL,
•Currently four different archive servers are in use.– SLS Accelerator data: slsmcarch (machine archive server;
HP, Xenon quadcore 2.66 GHz, 32 GB RAM)• Long Term: since January 2001; 10314 channels; 70 GB• Medium Term: 6 months; 66883 channels; 120 GB• Short Term Archiver: 14 days; 70381 channels; 114G GB• Post Mortem Archiver: Stores the last famous words• Total available disc space for data: 500 GB
– SLS Beamline data: slsblarch (beamline archive server; HP, AMD Opteron dualcore 1.8 GHz; 6 GB RAM)
• Long and short term archivers for every beamline (total 29 Engines)
• Short term archivers store data up to 12 months• Total amount of data: 163 GB / 384 GB
Channel Archiver at PSI
10/11/10EPICS Meeting, BNL,
•archive servers (cont)– PSI (office) data: gfaofarch
– Long Term Archiver: Stores data since January 2006– Medium and Short Term Archivers
• ZHE Cyclotron High Energy– Long (since April 2008)– Medium and short term
– SwissFEL: felarch1 (HP, Quadcore 2.66 GHz, 10 G RAM)• Small teststand OBLA
– 638 channels, 2.1 Terabytes!» Waveforms, images
• FIN250 test injector– LT, MT and ST (.6, 7.9 and 464 GB)
Channel Archiver at PSI
10/11/10EPICS Meeting, BNL,
– The archive engines are running stable– The problems we have had are on the retrieval side– Indexing is used to speed up retrieval
• Indexes on daily files• Master index on the whole archived data
– We need the performance• The SwissFEL test machine is going to produce a lot of data
– Waveforms, images– We need to archive more than in a production machine
– For us, there is no need for (immediate) change• We would like to keep the channel archiver going
– Updates, bugfixes– Retrieval tools
» Waveform viewer, etc have been developed» Matlab export would be welcome
• Indexing tools need work
Channel Archiver at PSI
10/11/10EPICS Meeting, BNL,
•Background– The ArchiveIndexTool is used at PSI in the night between
Saturday and Sunday each week to create master indexes for the midterm archive.
– Indexing is essential for good retrieval performance – The tool produces many errors when run on the EPICS archive
indices to produce or to update the master index.•Disclaimer: I know very little about this, I just tell what the people who work on this have reported.
– Involved people:• Gaudenz Jud (archiver maintenance, operation and
development)• Hans-Christian Stadler (PSI IT, Scientific Computing) is
investigating the issue together with Gaudenz
Index Tool improvements
10/11/10EPICS Meeting, BNL,
Findings so far: – After investigating an error log:
• From the code it is clear that the ArchiveEngine and the ArchiveIndexTool are not supposed to be used concurrently on the same indices.
• Running them concurrently does produce errors – but not those we see in production.
– the errors seem to only occur on the production machine, when there is a high load and a lot of disk activity.
– try a quick fix: a retry mechanism on the highest level. All index files are closed and reopened after a delay. This quick fix seems to work so far.
Index Tool improvements
10/11/10EPICS Meeting, BNL,
Observations:– The RTree implementation does not allow concurrent read/write
access. It might be possible to arrange the file operations in a way that allows concurrent access when the index is stored on a strictly POSIX compliant file system.
– The RTree implementation has a RTree node "cache" that only grows. Nodes are never evicted from the cache. I'm implementing a new LRU node cache with a fixed number of entries to see if this reduces system load.
– The RTree implementation uses many small disk operations (see example code above). A reimplementation should use large disk transfers.
– The RTree implementation is like a B-Tree, but does not adjust the node size to the disk sector size for improved I/O performance.
10/11/10EPICS Meeting, BNL,
•Observations (continued):– The RTree implementation is not optimal for the use case seen
at SLS, where data is inserted at the end only. This leads to a reduced fill level of the nodes. The RTree maintains the invariant, that only the root node may be filled less than 1/2. In addition to that data is moved between nodes too often, leading to many random accesses on disk. A reimplementation should feature a datastructure that is optimal for appends at the end.
10/11/10EPICS Meeting, BNL,
•Conclusions so far:– Finding out the real reason for the errors is a time consuming
process. The real reason for the errors has not yet been identified.
– the offsets zu Data structures in index get corrupted. However, it is not clear where.
– Because the corruption only happens when the load on the production system is high, logical errors in the normal execution path can be almost certainly excluded.
– The experience so far suggests that a new implementation of the RTree Code could solve a number of problems
10/11/10EPICS Meeting, BNL,
Thank you for your attention!