You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: 2014/csipaper/nexus14aip.tex
+69-62Lines changed: 69 additions & 62 deletions
Original file line number
Diff line number
Diff line change
@@ -130,7 +130,7 @@ \section{Introduction}
130
130
home-grown data formats. This scheme has a number of drawbacks addressed by NeXus:
131
131
\begin{itemize}
132
132
\item It makes the life of traveling scientists unnecessarily difficult as they must deal with multiple files
133
-
in different formats, file converters, etc., in order to extract scientific information from the data.
133
+
in different formats, file converters, \textit{etc}., in order to extract scientific information from the data.
134
134
\item An unnecessary burden is imposed on data analysis software producers to accommodate many different formats.
135
135
\item The whole idea of open access to data is sabotaged if the data is in a format that cannot be easily understood.
136
136
\item Scientific integrity is jeopardized if the data cannot be understood or important elements are missing.
@@ -147,7 +147,7 @@ \section{Introduction}
147
147
NeXus adds to HDF5:
148
148
\begin{itemize}
149
149
\item Rules for organizing domain-specific data within a HDF5 file
150
-
\itemA link structure to enable quick default visualization
150
+
\itemFeatures to enable rapid data visualization
151
151
\item A dictionary of documented domain-specific field names
152
152
\item Definitions of standards that can be validated
153
153
\end{itemize}
@@ -156,36 +156,38 @@ \section{Introduction}
156
156
157
157
\section{Design Principles}
158
158
159
-
The authors of data-acquisition and instrument-control software are encouraged to generate exactly \emph{one} NeXus container file per measurement
160
-
(a measurement is either a data accumulation under fixed conditions,
161
-
or a scan).
162
-
This file includes not only the detector and monitor data,
163
-
but also metadata, information on the state of the beamline, parameter logs, and more.
164
-
Authors of data-reduction and data-analysis software can use NeXus to
165
-
store processed data along with metadata and a processing log.
166
-
167
-
NeXus data files are built using basic HDF5 storage elements:
168
-
data groups (like file system folders),
169
-
data fields (such as strings, floats, integers, and arrays),
170
-
attributes (additional descriptors of groups and fields),
171
-
and links (like file system links). These basic storage elements are used to
172
-
build the \emph{base classes}, \emph{application definitions},
173
-
and \emph{contributed definitions} that elaborate the NeXus standard.
174
-
As a container format, NeXus allows files to be extended at any moment by
175
-
additional content, including NeXus base classes, HDF5 groups, and HDF5 datasets.
176
-
177
-
NeXus can be used for many different experimental techniques,
178
-
and at different levels of data processing.
179
-
For each of these different applications,
180
-
a specific subset of the standardized NeXus entities
181
-
(data groups and fields) is needed.
182
-
These subsets, and their hierarchical structure, are standardized
183
-
in the NeXus application definitions (Sect.~\ref{sect_appdef}).
159
+
NeXus utilizes certain design principles to make it easy to navigate even the most complex of HDF5 files. Data and associated
160
+
metadata are stored as fields within groups that have a logical (and often physical) association with the experiment (see FIG.~\ref{rawfile}).
161
+
HDF5 attributes are used to define the types, or classes, of these groups. For example, sample information is stored in a group of class \texttt{NXsample},
162
+
instrumental information in a group of class \texttt{NXinstrument}, \textit{etc}. The beamline components that form the instrument,
163
+
such as monochromators, collimators, and detectors, are stored as sub-groups within the \texttt{NXinstrument} group. This
164
+
hierarchical structure makes NeXus extremely flexible, capable of accommodating new types of instrument as they are developed,
165
+
and extremely scalable, capable of storing data from single point-detectors to complex multi detector configurations. It can also,
166
+
just as easily, contain processed data or even theoretical simulations to be stored alongside the experimental results.
167
+
168
+
These groups are contained within a root-level group with class \texttt{NXentry}. The \texttt{NXentry} group contains all the data from a single measurement,
169
+
which could represent data collected in a certain configuration or in a scan, so multiple measurements can be stored in separate \texttt{NXentry}
170
+
groups within a single file if needed. Each NeXus file is required to contain at least one \texttt{NXentry} group.
171
+
172
+
Each \texttt{NXentry} group should
173
+
contain at least one \texttt{NXdata} group, which contains the measured (or processed or simulated) data along with the other information required to plot it,
174
+
\textit{e.g.}, the plotting axis or axes. The NeXus design allows default plots of \texttt{NXdata} groups to be generated without any prior knowledge of the
175
+
type of measurement. This feature was implemented in NeXus before HDF5 introduced dimension scales, which provide similar functionality.
176
+
177
+
As well as defining a logical group structure, NeXus provides a dictionary of names that can be used to define specific fields within each class of
178
+
groups. For example, if the sample temperature is stored, the NeXus standard specifies that it should be called \texttt{temperature} and stored in
179
+
the \texttt{NXsample} group. These names are documented in the NeXus base class definitions (Sect.~\ref{sect_baseclasses}). It should be stressed that
180
+
it is not necessary for a particular NeXus file to contain every item defined for each base class; the base classes just define the names that should be
181
+
used when they are present. However, certain applications may require particular
182
+
items to be present for specific types of data analysis. For each of these different applications, a specific subset of the standardized NeXus entities
183
+
(data groups and fields) are standardized in the NeXus application definitions (Sect.~\ref{sect_appdef}).
184
+
185
+
The combination of a well-defined hierarchy of groups with a comprehensive and well-documented dictionary of data and metadata names ensures
186
+
that NeXus files are self-describing. It should be possible for another scientist to understand the contents of a NeXus file without consulting
187
+
documentation specific to any one facility or beamline. By enabling the storage of comprehensive metadata, the NeXus format facilitates the
188
+
sharing of data between collaborators and long-term data curation.
184
189
185
190
\section{File Hierarchies}
186
-
NeXus data files are organized into a hierarchy of groups which, in turn, can contain further groups or fields,
187
-
very much like an internal file system. The possible contents of each NeXus group are defined by a base class, while an application definition,
188
-
or a contributed definition, is used to specify which of these fields and groups are required for a particular type of analysis.
189
191
190
192
\subsection{Raw Data File Hierarchy}
191
193
@@ -195,14 +197,14 @@ \subsection{Raw Data File Hierarchy}
195
197
}
196
198
\end{figure}
197
199
198
-
A major focus of NeXus has been the recording of \emph{raw} experimental data, i.e. information taken directly from the experimental
200
+
A major focus of NeXus has been the recording of \emph{raw} experimental data, \textit{i.e.}, information taken directly from the experimental
199
201
equipment or processed only as required to provide physically meaningful values.
200
202
The NeXus raw data file hierarchy is the consequence of some practical considerations.
201
203
An overview of the NeXus data file structure for raw experimental data is shown in FIG.~\ref{rawfile}.
202
204
203
205
204
206
When looking at a beamline, it is easy to
205
-
discern different components: beam optic components, sample position, detectors, etc. It is quite natural to replicate this physical
207
+
discern different components: beam optic components, sample position, detectors, \textit{etc}. It is quite natural to replicate this physical
206
208
separation with a logical arrangement, in which metadata from each component are stored a separate group. This approach explains the
207
209
list of beamline components in the \texttt{NXinstrument} group presented in FIG.~\ref{rawfile}.
208
210
As there can be multiple instances of the same kind of equipment, like slits or detectors, in a given beamline, it becomes necessary
@@ -226,22 +228,26 @@ \subsection{Raw Data File Hierarchy}
226
228
also contain plottable data, it uses the same attribute scheme to associate the monitor data with its plotting axes. Its location in the
227
229
\texttt{NXentry} group facilitates quick inspection for beamline diagnostics.
228
230
231
+
Most NeXus files will also contain a \texttt{NXsample} group containing information about the sample being measured in the experiment, \textit{e.g.},
232
+
its chemical composition, mass, unit cell parameters, \textit{etc}. It may also contain information about the sample environment, such as
233
+
temperature or pressure. If one or more of these parameters is varied in an experiment, these could be used as scanned variables (see
234
+
Section III.A).
235
+
229
236
A special base class, \texttt{NXcollection}, exempts its contents from validation
230
237
and thereby allows inclusion of whatever data in arbitrary non-NeXus formats.
231
238
232
239
\subsubsection{Multiple Method Instruments}
233
240
234
-
Particularly at X-ray sources,
235
-
some instruments offer multiple techniques that can be used in parallel.
241
+
Some instruments, particularly at X-ray sources, offer multiple techniques that can be used in parallel.
236
242
For example small-angle scattering and powder diffraction
237
243
can be measured simultaneously at a SAXS/WAXS beamline.
238
244
We recommend storing the data from all methods in \emph{one} file,
239
245
in a \emph{single} \texttt{NXentry} hierarchy
240
-
(FIG.~\ref{multimethod}). All information from all detectors, logs and
241
-
such are collected in this one \texttt{NXentry} group to keep the data together.
242
-
Information that is particular for one experimental technique
243
-
is linked into a \texttt{NXsubentry}. The \texttt{NXsubentry} follows the hierarchy of
244
-
\texttt{NXentry}. But it will typically only link to the data required by the
246
+
(FIG.~\ref{multimethod}). All information from detectors, logs, \textit{etc}.,
247
+
are collected in this one \texttt{NXentry} group to keep the data together.
248
+
Information that is peculiar to one experimental technique
249
+
is linked into a \texttt{NXsubentry}. The \texttt{NXsubentry} follows the hierarchy of
250
+
\texttt{NXentry}, but it will typically only link to the data required by the
245
251
application definition for the specific experimental technique. The point of this scheme
246
252
is that both humans and computerized users can easily locate method-specific data while
247
253
maintaining the full view of the experiment.
@@ -282,7 +288,8 @@ \subsubsection{Scans}
282
288
\end{itemize}
283
289
284
290
NeXus allows multi-dimensional scans too. This makes it very simple to produce meaningful slices through data
285
-
volumes even with NeXus-agnostic software ({\it e.g.} HDFView\cite{hdfview}).
291
+
volumes, whether the software is designed for NeXus (\textit{e.g.}, NeXpy\cite{nexpy}) or NeXus-agnostic
292
+
(\textit{e.g.}, HDFView\cite{hdfview}).
286
293
% FIXME: this pathology is not necessary to describe, not unique to NeXus, too much detail for this manuscript
287
294
%Interrupting a multi-dimensional scan may, depending
288
295
%on the software used, leave some of the data in an uninitialised state (usually the HDF5 fill value).
@@ -306,7 +313,7 @@ \subsection{Processed Data}
306
313
307
314
The hierarchy is much reduced as it is not important to carry all experimental information in the data
308
315
reduction. In contrast to the raw data file structure, \texttt{NXdata} in the processed file structure is the place
309
-
to store the results of the processing, together with its associated axes if the result is a multi-dimensional array.
316
+
to store the results of the processing, together with its associated axis or axes.
310
317
311
318
In addition to the \texttt{NXdata} and \texttt{NXsample} groups,
312
319
the \texttt{NXprocess} group provides structure to store details
@@ -319,10 +326,10 @@ \section{Coordinate Systems, Positioning of Components and Further Rules}
319
326
320
327
For data reduction, it is often necessary to know the exact position and orientation of beamline components.
321
328
The first thing needed is a reference coordinate system. NeXus chose to use the same coordinate system as the
0 commit comments