Presentation Foils

Preparing for DECTRIS Eiger X 16M Data at NSLS II

Herbert J. Bernstein

Discussion Document Presented at NSLS-II at Brookhaven National Lab on 19 November 2014

Overview

• What is the DECTRIS Eiger X 16M?

– Like the Pilatus 6M, but more data, faster

• Compression Issues

– Special problems for NSLS-II and MX data

• Data Format and Test Data Issues

– Uses HDF5 rather than TIFF or CBF

– Data has to be simulated for now

For more, see Foils of Presentation at NSLS-II 19 November 2014

Coping with BIG DATA Image Formats: Integration of CBF, NeXus and HDF5

The BIG DATA demands of the new generation of X-ray pixel array detectors necessitate the use of new storage technologies as we meet the limitations of existing file systems. In addition, the modular nature of these detectors provides the possibility of more complex detector arrays which in turn requires a complex description of the detector geometry. Taken together these give an opportunity to combine the best of CBF/imgCIF (the Crystallographic Binary File), NeXus (a common data format for neutron, x-ray and muon science) and HDF5 (Hierarchical Data Format, version 5) for the management of such data at synchrotrons. Discussions are in progress between COMCIFS (the IUCr Committee for the Maintenance of the CIF Standard) and NIAC (the NeXus International Advisory Committee) on an integrated ontology. A proof-of-concept API based on CBFlib and the HDF5 API is being developed in a collaboration among Dowling College,

Brookhaven National Laboratory and Diamond Light Source. A preliminary mapping and a combined API are under development.

    • The new generation of high performance x-ray detectors requires integration of HDF5, NeXus and CBF.
    • The DECTRIS workshop in Baden, Switzerland in January 2013 established the parameters of the integration.
    • A collaboration has been working on specifications and code.
    • CBFlib 0.9.2.12
      • Can store arbitrary CBF files in HDF5 and recover them.
      • Supports use of all CBFlib compressions in HDF5 files.
      • Provides minicbf2nexus to convert sets of minicbf files to a single NeXus file.
    • A draft concordance between MX CBF and NeXus has been prepared.
    • Updated CBF dictionary has been prepared.
    • There is much work still to be done -- collaborators welcome.

For more, see Poster T-16 American Crystallographic Association Meeting 20-24 July 2013

White Paper: The Need for a New Data Management Framework for NSLS-II MX Data to Deal with New High Data Rate X-ray Detectors

This is a discussion of the scientific case for a new forward-looking, flexible, sustainable, robust, data-management framework for NSLS-II Macromolecular Crystallography (MX) beamline data to cope with expected data rates in the range of 60 megapixels to 2.5 gigapixels per beamline per second and beyond from the new generation of high data rate X-ray detectors. For NSLS, the data rates and volumes from existing beamlines have been low to moderate, allowing effective response to data management needs with periodic expansions using commercial off-the-shelf (COTS) hardware. New pixel-array detectors (PADs) already in use at NSLS and expected to be increasingly used at NSLS-II will strain existing data-management systems. We need a system that will accept data at very high rates, provide mechanisms to allow individual frames to be culled and sorted for short-term, intermediate-term and long-term use and manage appropriate portions at changing locations (synchrotron, home institution, publications, archives). It should allow for reliable data reduction and analysis in heterogeneous workflows. It should allow for reliable retention and transfer of forensic quality raw data where needed for fraud detection. It should allow for on-the-fly processing to reduced data when appropriate and allow for disposal of data no longer

For more, see BIGDATA Whitepaper 12 February 2013

Talk at Dectris Eiger Workshop, Baden Switzerland,

24 -- 25 January 2013

CBF/HDF5/NeXus Integration

For MX data reduction and structure solution at beamlines is

of increasing importance

• New sources and new detectors bring new challenges

• NSLS-II MX will be a BIGDATA problem and will need a welldesigned data management plan both for beamline data

processing and subsequent processing

• The issue of fraud in crystallography complicates the problem

• Some or all of raw data will need to be retained

• Handles, digital signatures and compression (lossless and

lossy) will be needed. Data reduction to spots, structure

factors or structures are examples of lossy compression

• HDF5, NeXus, CBFlib and database access will need to be

integrated to achieve these goals

For more, see Dectris Eiger Workshop presentation foils, Herbert Bernstein, 24 -- 25 January 2013