Skip to content

Dictionary of metadata

bruceravel edited this page Jan 18, 2012 · 29 revisions

Any proposal for an interchange format involves two categories of content in the data file -- (1) a table of numbers representing the spectral data, and (2) a listing of metadata organized in a format that can be read by both computers and humans.

On this page, we list a dictionary of metadata. Each entry in the dictionary consists of three components:

  1. The name representing the datum
  2. The meaning of the datum
  3. The format of representing its value

Overview

The syntax of the name

We recommend that metadata be gathered into related categories, here called namespaces. Thus related bits of metadata will also be related syntactically. As an example, one namespace might be called Scan and denote metadata related to choice of absorbing atom and the choice of parameters in the data acquisition program related to effecting that scan.

Using the syntax of the XDI suggestion, the name of the metadatum consists of the word Scan, followed by a dot, followed by another word. The dot tells the reader that the second word is related to the Scan namespace.

 Scan.element = Au
 Scan.edge = L3
 Scan.edge_energy = 11919 eV

Decisions have to be made about the allowed character set of the name, whether efforts at internationalization will be supported, and how deeply nested (i.e. whether one or more dots are allowed) names can be.

The meaning of metadata

One of the charges of the Data Format Working Group is to identify a set of metadata to be encoded in the specification of a data interchange format and to assign names to each meaningful concept. This effort must take a broad view, capturing metadata concepts as broadly as they are used in the community. This effort must also be open ended in that there must be a mechanism for providing new forms of metadata not considered up front.

The format of the value

Again, decisions must be made about character sets and internationalization. Among other decisions:

  1. Identification of standard units and whether units must be specified in a compliant file.
  2. Representations of numerical values and special data types like timestamps.
  3. Standards for identifying facilities and beamlines
  4. Representations of deeply nested data

The dictionary

Name spaces

The purpose of namespaces is to provide sensible, widely understood, semantic groupings of defined metadata tags. All tags associated with conveying information about sample preparation and the measurement environment of the sample belong in the Sample namespace. Similarly, all tags associated with the configuration of the beamline optics belong in the Beamline namespace.

Here is a list of all such semantic groupings:

  1. Beamline
  2. Scan
  3. Mono
  4. Facility
  5. Detector
  6. Sample
  7. Column

A tag is specified by inclusion in a namespace and identified by a name. An example is Mono.d_spacing which gives the d-spacing of the monochromator under operating conditions. In a tag, the namespace and tag name are separated by a dot (.). The tag and its value are separated by a colon and a single space. Here is an example:

 # Mono.d_spacing: 3.13525

Required metadata

We identify three items that are essential to the interchange and successful interpretation of XAS data. These are required of an XDI file.

  • The d-spacing of the monochromator. A correction to the energy axis of measured data is required in the case of a miscalibration due to inaccuracies in the translation from angular position of the monochromator to energy. See Mono.d_spacing.

  • The element of the absorbing atom. The periodic table is replete with examples of atoms that have absorption edges with very similar edge energies. For example, the tabulated values of the Cr K edge and the Ba L1 edge are both 5989 eV. Without the species of the absorbing atom and absorption edge measured, some data cannot cannot be unambiguously identified. See Scan.element.

  • The absorption edge measured. See above. See Scan.edge.

All other metadata definitions that follow are optional, but recommended.


Defined items in the Beamline namespace

  • Beamline.name: The name by which the beamline is known
  • Beamline.collimation: A concise statement of how beam collimation is provided
  • Beamline.focusing: A concise statement about how beam focusing is provided
  • Beamline.harmonic_rejection: A concise statement about how harmonic rejection is accomplished

Examples of tags in the Beamline namespace

 # Beamline.name: NSLS X11-A
 # Beamline.collimation: none
 # Beamline.focusing: No
 # Beamline.harmonic_rejection: detuned

Open issues in the Beamline namespace

  1. Should the Mono namespace be rolled into this namespace? That is, should Mono be a separate namespace?

Defined items in the Scan namespace

  • Scan.edge: The measured absorption edge. This is a required parameter.

  • Scan.element: The species of the measured element. This is a required parameter.

  • Scan.edge_energy: The value of the edge energy used in the data acquisition software.

Examples of tags in the Scan namespace

 # Scan.edge: K
 # Scan.element: Cu
 # Scan.edge_energy: 8980.0

Open issues in the Scan namespace

  1. We must define the syntax for specifying the edge symbol.

  2. We must specify the syntax for identifying the absorbing element, i.e. symbol? Z number? spelled-out name? If the latter is allowed, in which languages cn it be specified?


Defined items in the Mono namespace

  • Mono.name: A string identifying the material and diffracting plane or grating spacing of the monochromator

  • Mono.d_spacing: The known d-spacing of the monochromator under operating conditions. This is a required parameter.

Examples of tags in the Mono namespace

 # Mono.name: Si 111
 # Mono.d_spacing: 3.13525

Open issues in the Mono namespace

  1. How do we describe a polychromator?
  2. Should the Mono namespace be rolled into the Beamline namespace? That is, should Mono be a separate namespace?

Defined items in the Facility namespace

  • Facility.name: The name of synchrotron or other X-ray facility.

  • Facility.xray_source: A string identifying the source of X-ray generation.

Examples of tags in the Facility namespace

 # Facility.name: NSLS
 # Facility.name: bend magnet

Defined items in the Detector namespace

  • Detector.i0: A description of how the incident flux was measured
  • Detector.it: A description of how the transmitted flux was measured
  • Detector.if: A description of how the fluorescent flux was measured
  • Detector.ir: A description of how the reference flux was measured

Examples of tags in the Detector namespace

 # Detector.i0: 10cm  N2
 # Detector.i1: 10cm  N2

Open issues in the Detector namespace

  1. What goes into the description? Fill gases? Applied voltage? Gain?
  2. How are energy discriminating detectors described?
  3. How are wavelength dispersive detectors described?
  4. How is the detection for energy dispersive XAS described?

Defined items in the Sample namespace

  • Sample.name: A string identifying the measured sample
  • Sample.formula: The stoichiometric formula of the measured sample
  • Sample.prep: A string summarizing the method of sample preparation
  • Sample.temperature: The temperature at which the sample was measured
  • Sample.pressure: The pressure at which the sample was measured

Examples of tags in the Sample namespace

 # Sample.name: Hematite
 # Sample.formula: Fe2O3
 # Sample.prep: Powder spread on polyimide tape
 # Sample.temperature: room temperature

Open issues in the Sample namespace

  1. Formula stoichiometry will require a specification of syntax
  2. Do we want to define further tags for things like electrochmical potential or other in situ parameters?

Defined items in the Column namespace

All items in the Column namespace are integers and refer to columns in the data portion of the file. The columns are numbered from left to right starting at 1. The first column must be the abscissa, either energy or wavenumber.

 Column.1: energy

Subsequent columns are numbered sequentially and contain a description of the contents of the columns. For example

 Column.1: energy
 Column.2: mu
 Column.3: i0

Entries in the Column namespace are optional, but it is recommended to include a Column entry for every column of data.


Extension field

  1. Describe extensions within existing namespaces
  2. Describe extension field syntax (i.e. identical to defined field syntax)
  3. Explain relationship between application metadata and extension fields
Clone this wiki locally