Skip to content

Commit f8a0ea7

Browse files
committed
Rewritten "Design Principles" section
The existing version did not, IMHO, give enough of an overview for a new reader.
1 parent a9cebe5 commit f8a0ea7

File tree

1 file changed

+69
-62
lines changed

1 file changed

+69
-62
lines changed

2014/csipaper/nexus14aip.tex

Lines changed: 69 additions & 62 deletions
Original file line numberDiff line numberDiff line change
@@ -130,7 +130,7 @@ \section{Introduction}
130130
home-grown data formats. This scheme has a number of drawbacks addressed by NeXus:
131131
\begin{itemize}
132132
\item It makes the life of traveling scientists unnecessarily difficult as they must deal with multiple files
133-
in different formats, file converters, etc., in order to extract scientific information from the data.
133+
in different formats, file converters, \textit{etc}., in order to extract scientific information from the data.
134134
\item An unnecessary burden is imposed on data analysis software producers to accommodate many different formats.
135135
\item The whole idea of open access to data is sabotaged if the data is in a format that cannot be easily understood.
136136
\item Scientific integrity is jeopardized if the data cannot be understood or important elements are missing.
@@ -147,7 +147,7 @@ \section{Introduction}
147147
NeXus adds to HDF5:
148148
\begin{itemize}
149149
\item Rules for organizing domain-specific data within a HDF5 file
150-
\item A link structure to enable quick default visualization
150+
\item Features to enable rapid data visualization
151151
\item A dictionary of documented domain-specific field names
152152
\item Definitions of standards that can be validated
153153
\end{itemize}
@@ -156,36 +156,38 @@ \section{Introduction}
156156

157157
\section{Design Principles}
158158

159-
The authors of data-acquisition and instrument-control software are encouraged to generate exactly \emph{one} NeXus container file per measurement
160-
(a measurement is either a data accumulation under fixed conditions,
161-
or a scan).
162-
This file includes not only the detector and monitor data,
163-
but also metadata, information on the state of the beamline, parameter logs, and more.
164-
Authors of data-reduction and data-analysis software can use NeXus to
165-
store processed data along with metadata and a processing log.
166-
167-
NeXus data files are built using basic HDF5 storage elements:
168-
data groups (like file system folders),
169-
data fields (such as strings, floats, integers, and arrays),
170-
attributes (additional descriptors of groups and fields),
171-
and links (like file system links). These basic storage elements are used to
172-
build the \emph{base classes}, \emph{application definitions},
173-
and \emph{contributed definitions} that elaborate the NeXus standard.
174-
As a container format, NeXus allows files to be extended at any moment by
175-
additional content, including NeXus base classes, HDF5 groups, and HDF5 datasets.
176-
177-
NeXus can be used for many different experimental techniques,
178-
and at different levels of data processing.
179-
For each of these different applications,
180-
a specific subset of the standardized NeXus entities
181-
(data groups and fields) is needed.
182-
These subsets, and their hierarchical structure, are standardized
183-
in the NeXus application definitions (Sect.~\ref{sect_appdef}).
159+
NeXus utilizes certain design principles to make it easy to navigate even the most complex of HDF5 files. Data and associated
160+
metadata are stored as fields within groups that have a logical (and often physical) association with the experiment (see FIG.~\ref{rawfile}).
161+
HDF5 attributes are used to define the types, or classes, of these groups. For example, sample information is stored in a group of class \texttt{NXsample},
162+
instrumental information in a group of class \texttt{NXinstrument}, \textit{etc}. The beamline components that form the instrument,
163+
such as monochromators, collimators, and detectors, are stored as sub-groups within the \texttt{NXinstrument} group. This
164+
hierarchical structure makes NeXus extremely flexible, capable of accommodating new types of instrument as they are developed,
165+
and extremely scalable, capable of storing data from single point-detectors to complex multi detector configurations. It can also,
166+
just as easily, contain processed data or even theoretical simulations to be stored alongside the experimental results.
167+
168+
These groups are contained within a root-level group with class \texttt{NXentry}. The \texttt{NXentry} group contains all the data from a single measurement,
169+
which could represent data collected in a certain configuration or in a scan, so multiple measurements can be stored in separate \texttt{NXentry}
170+
groups within a single file if needed. Each NeXus file is required to contain at least one \texttt{NXentry} group.
171+
172+
Each \texttt{NXentry} group should
173+
contain at least one \texttt{NXdata} group, which contains the measured (or processed or simulated) data along with the other information required to plot it,
174+
\textit{e.g.}, the plotting axis or axes. The NeXus design allows default plots of \texttt{NXdata} groups to be generated without any prior knowledge of the
175+
type of measurement. This feature was implemented in NeXus before HDF5 introduced dimension scales, which provide similar functionality.
176+
177+
As well as defining a logical group structure, NeXus provides a dictionary of names that can be used to define specific fields within each class of
178+
groups. For example, if the sample temperature is stored, the NeXus standard specifies that it should be called \texttt{temperature} and stored in
179+
the \texttt{NXsample} group. These names are documented in the NeXus base class definitions (Sect.~\ref{sect_baseclasses}). It should be stressed that
180+
it is not necessary for a particular NeXus file to contain every item defined for each base class; the base classes just define the names that should be
181+
used when they are present. However, certain applications may require particular
182+
items to be present for specific types of data analysis. For each of these different applications, a specific subset of the standardized NeXus entities
183+
(data groups and fields) are standardized in the NeXus application definitions (Sect.~\ref{sect_appdef}).
184+
185+
The combination of a well-defined hierarchy of groups with a comprehensive and well-documented dictionary of data and metadata names ensures
186+
that NeXus files are self-describing. It should be possible for another scientist to understand the contents of a NeXus file without consulting
187+
documentation specific to any one facility or beamline. By enabling the storage of comprehensive metadata, the NeXus format facilitates the
188+
sharing of data between collaborators and long-term data curation.
184189

185190
\section{File Hierarchies}
186-
NeXus data files are organized into a hierarchy of groups which, in turn, can contain further groups or fields,
187-
very much like an internal file system. The possible contents of each NeXus group are defined by a base class, while an application definition,
188-
or a contributed definition, is used to specify which of these fields and groups are required for a particular type of analysis.
189191

190192
\subsection{Raw Data File Hierarchy}
191193

@@ -195,14 +197,14 @@ \subsection{Raw Data File Hierarchy}
195197
}
196198
\end{figure}
197199

198-
A major focus of NeXus has been the recording of \emph{raw} experimental data, i.e. information taken directly from the experimental
200+
A major focus of NeXus has been the recording of \emph{raw} experimental data, \textit{i.e.}, information taken directly from the experimental
199201
equipment or processed only as required to provide physically meaningful values.
200202
The NeXus raw data file hierarchy is the consequence of some practical considerations.
201203
An overview of the NeXus data file structure for raw experimental data is shown in FIG.~\ref{rawfile}.
202204

203205

204206
When looking at a beamline, it is easy to
205-
discern different components: beam optic components, sample position, detectors, etc. It is quite natural to replicate this physical
207+
discern different components: beam optic components, sample position, detectors, \textit{etc}. It is quite natural to replicate this physical
206208
separation with a logical arrangement, in which metadata from each component are stored a separate group. This approach explains the
207209
list of beamline components in the \texttt{NXinstrument} group presented in FIG.~\ref{rawfile}.
208210
As there can be multiple instances of the same kind of equipment, like slits or detectors, in a given beamline, it becomes necessary
@@ -226,22 +228,26 @@ \subsection{Raw Data File Hierarchy}
226228
also contain plottable data, it uses the same attribute scheme to associate the monitor data with its plotting axes. Its location in the
227229
\texttt{NXentry} group facilitates quick inspection for beamline diagnostics.
228230

231+
Most NeXus files will also contain a \texttt{NXsample} group containing information about the sample being measured in the experiment, \textit{e.g.},
232+
its chemical composition, mass, unit cell parameters, \textit{etc}. It may also contain information about the sample environment, such as
233+
temperature or pressure. If one or more of these parameters is varied in an experiment, these could be used as scanned variables (see
234+
Section III.A).
235+
229236
A special base class, \texttt{NXcollection}, exempts its contents from validation
230237
and thereby allows inclusion of whatever data in arbitrary non-NeXus formats.
231238

232239
\subsubsection{Multiple Method Instruments}
233240

234-
Particularly at X-ray sources,
235-
some instruments offer multiple techniques that can be used in parallel.
241+
Some instruments, particularly at X-ray sources, offer multiple techniques that can be used in parallel.
236242
For example small-angle scattering and powder diffraction
237243
can be measured simultaneously at a SAXS/WAXS beamline.
238244
We recommend storing the data from all methods in \emph{one} file,
239245
in a \emph{single} \texttt{NXentry} hierarchy
240-
(FIG.~\ref{multimethod}). All information from all detectors, logs and
241-
such are collected in this one \texttt{NXentry} group to keep the data together.
242-
Information that is particular for one experimental technique
243-
is linked into a \texttt{NXsubentry}. The \texttt{NXsubentry} follows the hierarchy of
244-
\texttt{NXentry}. But it will typically only link to the data required by the
246+
(FIG.~\ref{multimethod}). All information from detectors, logs, \textit{etc}.,
247+
are collected in this one \texttt{NXentry} group to keep the data together.
248+
Information that is peculiar to one experimental technique
249+
is linked into a \texttt{NXsubentry}. The \texttt{NXsubentry} follows the hierarchy of
250+
\texttt{NXentry}, but it will typically only link to the data required by the
245251
application definition for the specific experimental technique. The point of this scheme
246252
is that both humans and computerized users can easily locate method-specific data while
247253
maintaining the full view of the experiment.
@@ -282,7 +288,8 @@ \subsubsection{Scans}
282288
\end{itemize}
283289

284290
NeXus allows multi-dimensional scans too. This makes it very simple to produce meaningful slices through data
285-
volumes even with NeXus-agnostic software ({\it e.g.} HDFView\cite{hdfview}).
291+
volumes, whether the software is designed for NeXus (\textit{e.g.}, NeXpy\cite{nexpy}) or NeXus-agnostic
292+
(\textit{e.g.}, HDFView\cite{hdfview}).
286293
% FIXME: this pathology is not necessary to describe, not unique to NeXus, too much detail for this manuscript
287294
%Interrupting a multi-dimensional scan may, depending
288295
%on the software used, leave some of the data in an uninitialised state (usually the HDF5 fill value).
@@ -306,7 +313,7 @@ \subsection{Processed Data}
306313

307314
The hierarchy is much reduced as it is not important to carry all experimental information in the data
308315
reduction. In contrast to the raw data file structure, \texttt{NXdata} in the processed file structure is the place
309-
to store the results of the processing, together with its associated axes if the result is a multi-dimensional array.
316+
to store the results of the processing, together with its associated axis or axes.
310317

311318
In addition to the \texttt{NXdata} and \texttt{NXsample} groups,
312319
the \texttt{NXprocess} group provides structure to store details
@@ -319,10 +326,10 @@ \section{Coordinate Systems, Positioning of Components and Further Rules}
319326

320327
For data reduction, it is often necessary to know the exact position and orientation of beamline components.
321328
The first thing needed is a reference coordinate system. NeXus chose to use the same coordinate system as the
322-
neutron beamline simulation software McStas\cite{mcstas}.
329+
neutron beamline simulation software, McStas\cite{mcstas}.
323330

324-
For describing the placement and orientation of components, NeXus stores the same information as is used for the
325-
same purpose in the Crystallographic Interchange Format (CIF)\cite{ITCVG}. CIF (and NeXus) stores the details
331+
For describing the placement and orientation of components, NeXus stores the same information as the
332+
Crystallographic Interchange Format (CIF)\cite{ITCVG}. CIF (and NeXus) stores the details
326333
of the translations and rotations necessary to move a given component from the zero point of the coordinate
327334
system to its actual position. As coordinate transformations are not commutative, the order of transformations
328335
must also be stored.
@@ -342,6 +349,7 @@ \section{Coordinate Systems, Positioning of Components and Further Rules}
342349

343350

344351
\section{Base Classes}
352+
\label{sect_baseclasses}
345353

346354
As can be seen from the discussion of the NeXus file hierarchy,
347355
NeXus arranges data in groups which have a
@@ -350,7 +358,7 @@ \section{Base Classes}
350358
The term \emph{base class} is not used in the same sense as in
351359
object-oriented programming languages; in particular, there is no inheritance.
352360
The NeXus base classes provide a comprehensive dictionary of terms
353-
that can be used for each class.
361+
that can be used in each class.
354362
The terms in the dictionary comprise concepts and names common to the topic of the base class.
355363
The expected spelling and definition of each term is specified in the base classes.
356364
It is neither expected nor required to provide all the terms specified in a base class.
@@ -371,11 +379,10 @@ \section{Base Classes}
371379
These decisions can be standardized in the form of
372380
application definitions (see below, Sect.~\ref{sect_appdef}).
373381

374-
The NeXus base classes are encoded in NeXus Description Language (NXDL)\cite{nxman}. NXDL is
375-
just another form of an XML file that specifies the content of a NeXus base class.
376-
NXDL files may be parsed either by humans or by software and
377-
may be validated for syntax and content. The NXDL files are used to validate the structure of
378-
NeXus data files. Java source code of a GUI tool has been prepared\cite{nxvalidate} to perform such validation.%
382+
The NeXus base classes are defined in XML files using the NeXus Description Language (NXDL)\cite{nxman}.
383+
NXDL files may be parsed either by people or by software and
384+
may be validated for syntax and content. The NXDL files may be used to validate the structure of
385+
NeXus data files. GUI tools have been prepared\cite{nxvalidate} to perform such validation.%
379386
% The JAR file available, but it needs maintenance and vastly improved documentation how to use it
380387
% before it is ready for general release.
381388
% TODO: *** good HIGH-PRIORITY item for 2014 Code Camp ***
@@ -390,15 +397,15 @@ \section{Application Definitions}
390397
For each group, a \emph{minimum} content is specified.
391398
Application definitions are therefore different than
392399
base class definitions, which specify a comprehensive
393-
dictionary of terms that can be used.
400+
dictionary of terms that can be used but does not specify which are required.
394401

395402
Historically, an application definition addressed one type of instrument,
396-
like X-ray reflectometer, or direct-geometry neutron time-of-flight spectrometer.
403+
like an X-ray reflectometer or direct-geometry neutron time-of-flight spectrometer.
397404
Thus, application definitions were originally named \emph{instrument definitions}.
398-
However, as NeXus can also be used for processed data
399-
like a tomography reconstruction or a dynamic scattering law $S(Q,\omega)$,
400-
the more generic term \emph{application definition} has been adopted.
401-
405+
However, the same instrument can be used for different types of analysis that require different
406+
experimental variables; \textit{e.g.}, a powder diffractometer could be used for Rietveld
407+
refinements or pair-distribution-function analysis. The more generic term \emph{application definition} has
408+
been adopted to signify what data are required for each type of data analysis.
402409

403410
\section{Contributed Definitions}
404411
\label{sect_contribdef}
@@ -417,7 +424,7 @@ \section{Contributed Definitions}
417424
All such proposals from the scientific community to extend NeXus
418425
with new application definitions and base classes are added to
419426
NeXus, initially, as contributed definitions either in incubation
420-
or a special case not for general use. The NIAC is charged to
427+
or as a special case not for general use. The NIAC is charged to
421428
review any new contributed definitions and provide feedback to the
422429
authors before ratification and acceptance.
423430

@@ -440,17 +447,17 @@ \section{Governance}
440447
\section{Uptake of NeXus}
441448

442449
NeXus is already in use as the main data format at many facilities including Soleil, Diamond, SINQ, SNS, Lujan/LANL
443-
and KEK. Other facilities including ISIS, DESY and the $\mu$SR community are in the process of moving towards
444-
NeXus as their data format. At LBNL, NeXus is currently being adapted for XFEL serial crystallographic data.
445-
APS is storing some of its data collection using NeXus.
450+
and KEK. Other facilities including ISIS, DESY, and the $\mu$SR community are in the process of moving towards
451+
NeXus as their data format. At LBNL, NeXus is currently being adapted for XFEL
452+
serial crystallographic data. The APS is using it for some techniques.
446453
The EPICS\cite{epicsad} area detector software has a plug-in to write acquired images into NeXus data files.
447454
Also, some commercial manufacturers of area detectors now write acquired images into NeXus data files.
448455
% NOTE: do NOT name the companies or else we must add disclaimers to the bottom of the manuscript
449456

450-
The adoption of NeXus has taken some time. The reason is that NeXus is often chosen whenever
457+
The adoption of NeXus has taken some time. The reason is that partly NeXus is often chosen whenever
451458
a facility starts operation or undergoes major refurbishments. For those facilities where there is an existing and working
452459
pipeline from data acquisition to data analysis, the resources are usually lacking to move
453-
towards NeXus as the only data file format.
460+
towards NeXus as the only data file format.
454461

455462
This is reflected in the experience of the muon community. For the ISIS source, the move to a Windows PC-based data acquisition
456463
system in 2002 required a new data format, providing an ideal opportunity to exploit the emerging NeXus standard\cite{muon1}. In

0 commit comments

Comments
 (0)