Preparing for DECTRIS Eiger X 16M Data at NSLS II
Herbert J. Bernstein
Discussion Document Presented at NSLS-II at Brookhaven National Lab on 19 November 2014
• What is the DECTRIS Eiger X 16M?
– Like the Pilatus 6M, but more data, faster
• Compression Issues
– Special problems for NSLS-II and MX data
• Data Format and Test Data Issues
– Uses HDF5 rather than TIFF or CBF
– Data has to be simulated for now
For more, see Foils of Presentation at NSLS-II 19 November 2014
Coping with BIG DATA Image Formats: Integration of CBF, NeXus and HDF5
The BIG DATA demands of the new generation of X-ray pixel array detectors necessitate the use of new storage technologies as we meet the limitations of existing file systems. In addition, the modular nature of these detectors provides the possibility of more complex detector arrays which in turn requires a complex description of the detector geometry. Taken together these give an opportunity to combine the best of CBF/imgCIF (the Crystallographic Binary File), NeXus (a common data format for neutron, x-ray and muon science) and HDF5 (Hierarchical Data Format, version 5) for the management of such data at synchrotrons. Discussions are in progress between COMCIFS (the IUCr Committee for the Maintenance of the CIF Standard) and NIAC (the NeXus International Advisory Committee) on an integrated ontology. A proof-of-concept API based on CBFlib and the HDF5 API is being developed in a collaboration among Dowling College,
Brookhaven National Laboratory and Diamond Light Source. A preliminary mapping and a combined API are under development.
- The new generation of high performance x-ray detectors requires integration of HDF5, NeXus and CBF.
- The DECTRIS workshop in Baden, Switzerland in January 2013 established the parameters of the integration.
- A collaboration has been working on specifications and code.
- CBFlib 0.9.2.12
- Can store arbitrary CBF files in HDF5 and recover them.
- Supports use of all CBFlib compressions in HDF5 files.
- Provides minicbf2nexus to convert sets of minicbf files to a single NeXus file.
- A draft concordance between MX CBF and NeXus has been prepared.
- Updated CBF dictionary has been prepared.
- There is much work still to be done -- collaborators welcome.
White Paper: The Need for a New Data Management Framework for NSLS-II MX Data to Deal with New High Data Rate X-ray Detectors
This is a discussion of the scientiﬁc case for a new forward-looking, ﬂexible, sustainable, robust, data-management framework for NSLS-II Macromolecular Crystallography (MX) beamline data to cope with expected data rates in the range of 60 megapixels to 2.5 gigapixels per beamline per second and beyond from the new generation of high data rate X-ray detectors. For NSLS, the data rates and volumes from existing beamlines have been low to moderate, allowing effective response to data management needs with periodic expansions using commercial off-the-shelf (COTS) hardware. New pixel-array detectors (PADs) already in use at NSLS and expected to be increasingly used at NSLS-II will strain existing data-management systems. We need a system that will accept data at very high rates, provide mechanisms to allow individual frames to be culled and sorted for short-term, intermediate-term and long-term use and manage appropriate portions at changing locations (synchrotron, home institution, publications, archives). It should allow for reliable data reduction and analysis in heterogeneous workﬂows. It should allow for reliable retention and transfer of forensic quality raw data where needed for fraud detection. It should allow for on-the-ﬂy processing to reduced data when appropriate and allow for disposal of data no longer
For more, see BIGDATA Whitepaper 12 February 2013
Talk at Dectris Eiger Workshop, Baden Switzerland,
24 -- 25 January 2013
For MX data reduction and structure solution at beamlines is
of increasing importance
• New sources and new detectors bring new challenges
• NSLS-II MX will be a BIGDATA problem and will need a welldesigned data management plan both for beamline data
processing and subsequent processing
• The issue of fraud in crystallography complicates the problem
• Some or all of raw data will need to be retained
• Handles, digital signatures and compression (lossless and
lossy) will be needed. Data reduction to spots, structure
factors or structures are examples of lossy compression
• HDF5, NeXus, CBFlib and database access will need to be
integrated to achieve these goals