Skip to content
This repository was archived by the owner on Mar 28, 2018. It is now read-only.
w100pea edited this page Sep 5, 2011 · 2 revisions

Eric's initial thoughts on MetAssimulo's data format

  • It illustrates challenges a test data format will need to address: inclusion of a number of different simulation parameters (some of which we won't have thought of at the time we create the format), inclusion of the test spectra, separation of the test spectra into different groups.
  • It is specific to their software.
  • Their data includes many things that are not necessary in all test spectra. Not every test spectrum (mine, for example) simulates real metabolites or simulates a pH.
  • The struct they use contains similar application specific information.
  • Their data leaves no place for things others might want to simulate (ionic strength, for example)
  • Their format is not defined anywhere. This makes it hard to extend directly. We may, for example, want to simulate ionic strength or even unnamed parameters that cause correlated shifts in peaks.
  • Using file-names for tables and directories for databases is easy (it might be the way to go, just for that reason) but causes maintainability problems. For example, it is easy for files to be separated from their buddies. "Where did this particular Concentrations_Mix1.txt come from." Additionally the experiment metadata and the structure of the test database is contained exclusively in the filenames and directory names. I'd prefer that everything be in the same file and that renaming it did not destroy metadata.
  • I don't like storing the output as a particular format struct in a .mat file. Their struct contains fields that may not be applicable and lacks fields that may be applicable. Mat files are hard to read and write outside of matlab.
MetAssimulo may become widely used, so we should deal with the format.

Another thing I'd like to note about the MetAssimulo software is that it is missing important parameters. If I want to do my original idea for my dissertation, I wont be able to use it unmodified. The authors do not do secondary matrix effects on shift, only accounting for pH. A paper by Alm et. al shows that just accounting for one global shift parameter (pH) is not sufficient to give good alignment of peaks.

I think the MetAssimulo format provides a good starting point for thinking about the structure needed for our standard, but I don't want to use it as-is. Whatever our final standard, though, we will want to write a converter for MetAssimulo data since we'll probably use it to generate some data and other researchers will as well. We can make it easy for them to validate our tools if we make their data easy to convert to our format.

Clone this wiki locally