This repository provides the R code to reproduce the data analysis summaries and figures for the Deep Metabolome Annotation (DMA) of Daphnia magna paper.
The repository contains three main analysis workflows:
- Daphnia annotation summary - Analysis of metabolite annotations from D. magna samples
- Metabolite reference standards analysis summary - Analysis of metabolite standard mixture (MSM) data
- Phylo analysis - Phylogenetic/metabolomics analysis across species
- Example Feature Check (Galaxy workflow history access) - Example for how the Galaxy workflow histories can be investigated
├── input/
│ ├── input_for_feature_check/ # Inputs for Galaxy workflow feature check
│ │ ├── galaxy_peaklist_references.csv
│ │ └── GalaxyNone-[samplelist_dma_daphnia_magna.tabular].tabular
│ ├── input_for_summary_plots/ # Data for Daphnia and MSM analysis
│ │ ├── merged_annotations_all_classified.zip
│ │ ├── metabolite_standard_mixture_details.csv
│ │ └── pubchem_set.zip
│ └── input_for_phylometab_plot/ # Data for phylometab analysis
│ ├── chebi_with_inchikey_source_classyfire.csv
│ ├── Daphnia_ChEBI.csv
│ ├── MTox.csv
│ ├── phyloT_generated_tree_1734701763_newick.txt
│ └── pubchem_kegg_hmdb_expanded.zip
├── output/ # Generated figures and summary tables
├── example_feature_check.R # Galaxy workflow feature check example
├── paper_summarise_daphnia.R # Main Daphnia annotation analysis
├── paper_summarise_msm.R # Metabolite standard mixture analysis
└── paper_phylometab.R # Phylometab metabolomics analysis
- R (>= 4.4.3)
- RStudio (recommended)
- Required R packages are managed via
renv(see Installation section)
- Clone this repository
- Open the R project in RStudio:
dmagna-dma-paper.Rproj - init the R environment using renv:
renv::init()This will install all required packages with their exact versions as specified in renv.lock.
Run the main Daphnia annotation summarization:
source("paper_summarise_daphnia.R")Generates:
- Summary statistics and visualizations of metabolite annotations
- Classification analysis (superclass, class, subclass)
- Workflow comparison plots
- Venn diagrams for extraction methods, chromatography types, and polarity
- PCA analysis of annotations
- Tree maps and upset plots
Key outputs:
FIG_5a_tree_map.pdf- Tree map visualizationFIG_5b_annotations_all_pca.pdf- PCA plot of annotationsFIG_5c-e_*_bar.pdf- Bar charts for chemical classificationsFIG_6a-e_*.pdf- Workflow and method comparison plotsdaphnia_annotation_summary.csv- Summary statistics tableFIG_27-29.pdf/png- Supplementary annotation summary plots
Run the metabolite reference standards analysis:
source("paper_summarise_msm.R")Generates:
- Analysis of metabolite standard mixture (MSM) annotations
- Workflow-specific analysis for MSM data
Key outputs:
FIG_S30a_galaxy_msms_workflow_bar.pdf- MSM workflow analysisFIG_S30b_treemap_msm.pdf- MSM tree mapFIG_S31_presence_absence_match_type_msm.pdf- Match type analysismsm_annotations_summary.csv- MSM summary statistics
Run the phylogenetic/ metabolomics analysis:
source("paper_phylometab.R")Generates:
- Phylogenetic tree with metabolite presence/absence data
- Cross-species metabolite comparison
- Database mapping analysis (KEGG, HMDB, MTox, ChEBI)
Key output:
FIG_7_phylomet.pdf- Phylogenetic metabolomics plot
Use the example feature check to show how readers can directly access files from Galaxy workflows and verify LC-MS feature details against blank-filtered XCMS features.
source("example_feature_check.R")What it does:
- Downloads XCMS peak lists and xcmsSet objects from Galaxy URLs
- Rebuilds RT windows and performs blank filtering
- Links the XCMS features from the Galaxy workflow to full annotation list
Inputs:
- Galaxy workflow file URLs in
input/input_for_feature_check/galaxy_peaklist_references.csv - Sample metadata in
input/input_for_feature_check/GalaxyNone-[samplelist_dma_daphnia_magna.tabular].tabular - Full annotation list in input/input_for_summary_plots/merged_annotations_all_classified.zip
Key outputs (per assay in output/<assay_name>/):
*_DE_blank_filtered.RDSand*_blank_filtered_peak_matrix.csv*_xcms_passed_annos.csv
The analysis relies on several R packages:
- Data manipulation:
dplyr,tidyr,data.table,stringr - Visualization:
ggplot2,cowplot,treemap,VennDiagram,UpSetR - Chemical informatics:
ChemmineR - Phylogenetics:
ape,ggtree,aplot - Data import:
openxlsx,jsonlite
The code reproduces the following figures from the paper:
Main Figures:
- Figure 5: Metabolite annotation overview (tree map, PCA, classification bars)
- Figure 6: Workflow and method comparisons (Venn diagrams, upset plots)
- Figure 7: Phylogenetic metabolomics analysis
Supplementary Figures:
- Aditional method comparisons and MSM analysis
All generated figures are saved as PDF files in the output/ directory. Summary tables are saved as CSV files for further analysis or inclusion in manuscripts.
Also includes an updated metabolites file created for the MetaboLights study MTBLS2273.
See LICENSE file for details.
If you use this code or data within this repo please cite the corresponding D. magna DMA paper.