Skip to content

Dictionary of metadata

bruceravel edited this page Jan 19, 2012 · 29 revisions

Any proposal for an interchange format involves two categories of content in the data file -- (1) a table of numbers representing the spectral data, and (2) a listing of metadata organized in a format that can be read by both computers and humans.

On this page, we list a dictionary of metadata. Each entry in the dictionary consists of three components:

  1. The name representing the datum
  2. The meaning of the datum
  3. The format of representing its value

Note that the syntax of the XDI proposal is used on this wiki page. This page is NOT intended for discussion of syntax. Should xasCIF or some other suggestion be adopted, the discussion on this page will remain valid, but the syntax of the various examples would change.

Also note that no attempt has yet been made to respond to the recent discussion on the topic of non-shallow metadata. See the thread beginning at http://millenia.cars.aps.anl.gov/pipermail/xasformat/2012-January/000083.html . That no content on this page yet addresses this important topic is not intended to suggest that the Working Group has come to a conclusion on that topic. It is merely the case that this is an early draft of the dictionary.

Overview

Overview of the syntax used in this document

We recommend that metadata be gathered into related categories, here called namespaces. Thus related bits of metadata will also be related syntactically. As an example, one namespace might be called Scan and denote metadata related to choice of absorbing atom and the choice of parameters in the data acquisition program related to effecting that scan.

Using the syntax of the XDI suggestion, the name of the metadatum consists of the word Scan, followed by a dot, followed by another word. The dot tells the reader that the second word is related to the Scan namespace.

 # Scan.element = Au
 # Scan.edge = L3
 # Scan.edge_energy = 11919 eV

Decisions have to be made about the allowed character set of the name, whether efforts at internationalization will be supported, and how deeply nested (i.e. whether one or more dots are allowed) names can be.

The meaning of metadata

One of the charges of the Data Format Working Group is to identify a set of metadata to be encoded in the specification of a data interchange format and to assign names to each meaningful concept. This effort must take a broad view, capturing metadata concepts as broadly as they are used in the community. This effort must also be open ended in that there must be a mechanism for providing new forms of metadata not considered up front.

The format of the value

Again, decisions must be made about character sets and internationalization. Among other decisions:

  1. Identification of standard units and whether units must be specified in a compliant file.
  2. Representations of numerical values and special data types like timestamps.
  3. Standards for identifying facilities and beamlines
  4. Representations of deeply nested data

The dictionary

Name spaces

The purpose of namespaces is to provide sensible, widely understood, semantic groupings of defined metadata tags. All tags associated with conveying information about sample preparation and the measurement environment of the sample belong in the Sample namespace. Similarly, all tags associated with the configuration of the beamline optics belong in the Beamline namespace.

Here is a list of all such semantic groupings:

  1. Beamline: Tags related to the structure of the beamline and its optics
  2. Scan: Tags related to the parameters of the scan
  3. Mono: Tags related to the monochromator
  4. Facility: Tags related to the synchrotron or other facility at which the measurement was made
  5. Detector: Tags related to the details of the photon detection system
  6. Sample: Tags related to the details of sample preparation and measurment
  7. Column: Tags describing the contents of the columns in the data table

Required metadata

We identify three items that are essential to the interchange and successful interpretation of XAS data. These are required of an XDI file.

  • The d-spacing of the monochromator. A correction to the energy axis of measured data is required in the case of a miscalibration due to inaccuracies in the translation from angular position of the monochromator to energy. See Mono.d_spacing.

  • The element of the absorbing atom. The periodic table is replete with examples of atoms that have absorption edges with very similar edge energies. For example, the tabulated values of the Cr K edge and the Ba L1 edge are both 5989 eV. Without identification of the species of the absorbing atom and of the absorption edge measured, some data cannot cannot be unambiguously determined. See Scan.element.

  • The absorption edge measured. See above. See Scan.edge.

All other metadata definitions that follow are optional, but recommended.


Defined items in the Beamline namespace

  • Beamline.name: The name by which the beamline is known
  • Beamline.collimation: A concise statement of how beam collimation is provided
  • Beamline.focusing: A concise statement about how beam focusing is provided
  • Beamline.harmonic_rejection: A concise statement about how harmonic rejection is accomplished

Examples of tags in the Beamline namespace

 # Beamline.name: NSLS X11-A
 # Beamline.collimation: none
 # Beamline.focusing: none
 # Beamline.harmonic_rejection: detuned mono by 50% on I0

Click here to see issues related to the Beamline namespace


Defined items in the Scan namespace

  • Scan.edge: The measured absorption edge. This is a required parameter.

  • Scan.element: The species of the measured element. This is a required parameter.

  • Scan.edge_energy: The value of the edge energy used in the data acquisition software.

  • Scan.start_time: The beginning time of the scan. The time must be denoted according to the ISO 8601 specification for combined dates and times

  • Scan.end_time: The end time of the scan. The time must be denoted according to the ISO 8601 specification for combined dates and times

Examples of tags in the Scan namespace

 # Scan.edge: K
 # Scan.element: Cu
 # Scan.edge_energy: 8980.0
 # Scan.start_time: 20110401T1202

In that example, the beginning time of the scan is 2 minutes after noon on April Fool's Day (1st April) in the year 2011.

Click here to see issues related to the Scan namespace


Defined items in the Mono namespace

  • Mono.name: A string identifying the material and diffracting plane or grating spacing of the monochromator

  • Mono.d_spacing: The known d-spacing of the monochromator under operating conditions. This is a required parameter.

Examples of tags in the Mono namespace

 # Mono.name: Si 111
 # Mono.d_spacing: 3.13525

Click here to see issues related to the Mono namespace


Defined items in the Facility namespace

  • Facility.name: The name of synchrotron or other X-ray facility.

  • Facility.xray_source: A string identifying the source of X-ray generation.

Examples of tags in the Facility namespace

 # Facility.name: NSLS
 # Facility.name: bend magnet

Defined items in the Detector namespace

  • Detector.i0: A description of how the incident flux was measured
  • Detector.it: A description of how the transmitted flux was measured
  • Detector.if: A description of how the fluorescent flux was measured
  • Detector.ir: A description of how the reference flux was measured

Examples of tags in the Detector namespace

 # Detector.i0: 10cm  N2
 # Detector.i1: 10cm  N2

Click here to see issues related to the Detector namespace


Defined items in the Sample namespace

  • Sample.name: A string identifying the measured sample
  • Sample.formula: The stoichiometric formula of the measured sample
  • Sample.prep: A string summarizing the method of sample preparation
  • Sample.temperature: The temperature at which the sample was measured

The Sample namespace is rather open-ended. It is probably impossible to anticipate all the kinds of sample-related metadata that may be useful to attach to data. That said, it would be useful to suggest tags for a number of common kinds of extrinsic parameters.

Here are some other possible tags denoting extrinsic parameters of the experiment along the line of Sample.temperature.

  • Sample.pressure
  • Sample.ph
  • Sample.eh
  • Sample.volume
  • Sample.porosity
  • Sample.density
  • Sample.resistivity
  • Sample.viscosity
  • Sample.magnetic_field
  • Sample.magnetic_moment
  • Sample.crystal_structure
  • Sample.opacity
  • Sample.electrochemical_potential

Examples of tags in the Sample namespace

 # Sample.name: Hematite
 # Sample.formula: Fe2O3
 # Sample.prep: Powder spread on polyimide tape
 # Sample.temperature: room temperature

Click here to see issues related to the Sample namespace


Defined items in the Column namespace

All items in the Column namespace are integers and refer to columns in the data portion of the file. The columns are numbered from left to right starting at 1. The first column must be the abscissa, either energy or wavenumber.

 Column.1: energy

Subsequent columns are numbered sequentially and contain a description of the contents of the columns. For example

 Column.1: energy
 Column.2: mu
 Column.3: i0

Entries in the Column namespace are optional, but it is recommended to include a Column entry for every column of data.


Extension fields

Metadata tags carry syntax and may carry semantics. That is, it is possible to have syntactically correct tags that have no definition. Such tags could carry information considered useful by the user or the author of software that, at some point, touches the data.

Such a tag could be an extension within an existing namespace. This has already been discussed in the context of the Sample namespace.

Such a tag could be part of a new namespace. One application of a new namespace would be to tie a group of metadata tags to a particular application. For example, the data processing program Athena might attach tags associated with the parameters for normalizing the data. That might look something like this:

 # Athena.pre1: -150
 # Athena.pre2: -30
 # Athena.nor1: 150
 # Athena.nor2: 800

These define the boundaries of the pre- and post-edge lines used to define the edge step in the μ(E) spectrum.

The use of such extension tags is encouraged by authors of controls, data acquisition, data analysis, and data archiving software.

If an extension tag is not understood due its lack of defined semantics, the default behavior for software touching the data be to silently preserve the metadata.

User comments

At any point, but certainly at the time of data acquisition, it is useful to provide a mechanism for the user to associate arbitrary comments with the data. These comments are in the form of unstructured text. This text should be persevered as written, respecting white space, line breaks, orthography, and the use of non-ASCII characters.

In the syntax of the XDI proposal, the user comments follow the line of slashes and precede the line of dashes.

 # ////
 # This is a line of user comments.
 #    These are a couple other
 #   lines of user comments.
 # -----------------------------------------------------

Column labels

The line of column labels somewhat replicates the Column namespace. However, the Column namespace can be used for extended descriptions of the contents of each column, while the column labels much each be a single word with no whitespace. The column label line is required for the sake of the many existing plotting and other programs that recognize this last line of text before the table of numbers and use its contents.

The column labels must be preserved for any columns that are exported to another XDI (or xasCIF or whatever) compliant file.

  # -----------------------------------------------------
  #    Energy      mu     I0        It     Ifch1     Ifch2     Ifch3     Ifch4    Iref    IntTime
  # <table of numbers follows>
Clone this wiki locally