Skip to content

Dictionary of metadata

bruceravel edited this page Jan 12, 2012 · 29 revisions

Any proposal for an interchange format involves two categories of content in the data file -- (1) a table of numbers representing the spectral data, and (2) a listing of metadata organized in a format that can be read by both computers and humans.

On this page, we list a dictionary of metadata. Each entry in the dictionary consists of three components:

  1. The name representing the datum
  2. The meaning of the datum
  3. The format of representing its value

Overview

The syntax of the name

We recommend that metadata be gathered into related categories, here called namespaces. Thus related bits of metadata will also be related syntactically. As an example, one namespace might be called Scan and denote metadata related to choice of absorbing atom and the choice of parameters in the data acquisition program related to effecting that scan.

Using the syntax of the XDI suggestion, the name of the metadatum consists of the word Scan, followed by a dot, followed by another word. The dot tells the reader that the second word is related to the Scan namespace.

Scan.element = Au Scan.edge = L3 Scan.edge_energy = 11919 eV

Decisions have to be made about the allowed character set of the name, whether efforts at internationalization will be supported, and how deeply nested (i.e. whether one or more dots are allowed) names can be.

The meaning of metadata

One of the charges of the Data Format Working Group is to identify a set of metadata to be encoded in the specification of a data interchange format and to assign names to each meaningful concept. This effort must take a broad view, capturing metadata concepts as broadly as they are used in the community. This effort must also be open ended in that there must be a mechanism for providing new forms of metadata not considered up front.

The format of the value

Again, decisions must be made about character sets and internationalization. Among other decisions:

  1. Identification of standard units and whether units must be specified in a compliant file.
  2. Representations of numerical values and special data types like timestamps.
  3. Standards for identifying facilities and beamlines
  4. Representations of deeply nested data

The dictionary

Name spaces

  1. Beamline
  2. Scan
  3. Mono
  4. Facility
  5. Detector
  6. Sample
  7. Column

Defined items in the Beamline namespace

Defined items in the Scan namespace

Defined items in the Mono namespace

Defined items in the Facility namespace

Defined items in the Detector namespace

Defined items in the Sample namespace

Defined items in the Column namespace

All items in the Column namespace are integers and refer to columns in the data portion of the file. The columns are numbered from left to right starting at 1. The first column must be the abscissa, either energy or wavenumber.

 Column.1: energy

Subsequent columns are numbered sequentially and contain a description of the contents of the columns. For example

 Column.1: energy
 Column.2: mu
 Column.3: i0

Entries in the Column namespace are optional, but it is recommended to include a Column entry for every column of data.

Clone this wiki locally