-
Notifications
You must be signed in to change notification settings - Fork 5
Overview
With the motivation from the previous section, the goals for the format and library are:
- store spectra for exchange, especially for model compounds. Raw data, direct from the beamline will probably need to be converted to this format.
- store information about the sample, measurement conditions, etc.
- store multiple spectra, either on the same sample or multiple samples, and possibly taken at many facilities.
- provide programming libraries and simple standalone applications that can read, write, and manage such data libraries. Programming libraries would have to support multiple languages.
There are a few reasonable ways to solve this problem. What follows below is a methods which makes heavy use of ''relational databases'' and SQL. The principle argument here is that relational databases offer a well-understood, proven way to store data with extensible meta-data. The use of SQL also makes the programming libraries simpler, as they can rely on tested SQL syntax to access the underlying data store.
As the XAS Data Library is being developed, code and examples will be available at https://github.com/XraySpectroscopy/XASDataLibrary
I propose using SQLite, a widely used, Free relational database engine as the primary store for the XAFS Data Library. A key feature of SQLite is that it needs no external server or configuration -- the database is contained in a single disk file. SQLite databases can accessed with a variety of tools, for example DB Browser for SQLite and SQLite Manager addon for Firefox.
SQL-based relational databases may not be the most obvious choice for storing scientific data composing of arrays of related data. One obvious limitation is that relational databases don't store array data very well. Thus storing array data in a portable way within the confines of an SQL database needs special attention. The approach adopted here is to JSON, which can encapsulate an array, or other complex data structure into a string.
JSON -- Javascript Object Notation -- provides a standard, easy-to-use method for encapsulating complex data structures into strings that can be parsed and used by a large number of programming languages as the original data. In this respect, the requirements for the XAS Data Library -- numerical arrays of data -- are fairly modest. Storing array data in strings is, of course, what ASCII Column Files have done for years, only not with the benefit of a standard programming interface to read them. As an example, an array of data [8000, 8001.0 , 8002.0] would be encoded in JSON as
'[8000, 8001.0, 8002.0]'
This is considerably easier and lighter weight than using XML to encode array data.
In addition to encoding numerical arrays, JSON can also encode an associative array (also known as a Hash Table, Dictionary, Record, or Key/Value List. This can be a very useful construct for storing attribute information. It might be tempting to use such Associative Arrays for many pieces of data inside the database, this would prevent those data from being used in SQL SELECT and other statements: such data would not be available for making relations. But, as Associative Arrays can so useful and extensible, several of the tables in the database include a attributes column that is always stored as text. This data will be expected to hold a JSON-encoded Associative Array that may be useful to complement the corresponding notes column. This data cannot be used directly in searching the database, but may be useful to particular applications.
While robust, powerful and compliant with SQL standards, SQLite does not always provide as rich set of Data Types as some SQL relational databases. In particular for the design here, SQLite does not support Boolean values or Enum fields. Integer Values are used in place of Boolean Values. Enum values (which may have been used to encode Elements, Collection Modes, etc) are implemented as indexes into foreign tables, and JOINs must be used to relate the data in the tables.