Inquiry about the general workflow of how the cross-linking of identifiers and chemical information assigned in humanGEM #370
-
Dear Human-GEM developers, I am very happy to see the constant updates of Human-GEM model and thank you for making this model available! I am currently compiling the model for our usage. What I try to understand is how you assign all the crosslinking identifiers and accurately assign the chemical information.
I am sorry that I didn't follow your format. Hope you can help! Thanks, |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 10 replies
-
Hi @gmhhope, thank you for your comments, we're glad you find the resource useful! The mapping and assignment of identifiers has not been so systematic, given the incompleteness and sometimes questionable reliability of the many sources (previous GEMs, databases, etc.). This means that there is a lot of manual work involved which is neither efficient nor sustainable long-term, so we're still in the process of trying to automate as much of it as possible.
I believe most of the identifier information currently in Human-GEM (and its associated annotation files) actually comes from its predecessor models (HMR, Recon, and iHsa), during a very careful and involved merging process. We did attempt to map as many identifiers to MetaNetX as it was one of the most exhaustive and comprehensive resources we found, and this was in turn used to retrieve some additional identifiers and chemical information, and resolve some conflicts. The plan moving forward is to use the ChEBI ID as the "true source" when possible, and eventually use that to retrieve cross-linked identifiers and chemical properties. But of course not all of the model metabolites yet (or will ever) exist in the ChEBI database, so MetaNetX mapping will also continue to be a high priority.
Great question - I'm actually not sure how to obtain some sort of flatfile (e.g., csv) that includes the VMH metabolite identifiers and associations/chemical information from the VMH website. Maybe @mihai-sysbio knows?
Most originate from the preceding models. We also did an extensive manual curation of metabolite formulas and charges after generating Human1 to reconcile mass/charge imbalances and address problematic cases such as polymerization and highly combinatorial processes (e.g., lipid metabolism). This unfortunately means that some of the formulas may not agree with some databases, but usually those cases are for quite large/complex metabolites. The assumed pH/protonation state is also a major factor contributing to differences in formula/charge among different sources, further complicated a more automated handling of this issue. |
Beta Was this translation helpful? Give feedback.
Hi @gmhhope, thank you for your comments, we're glad you find the resource useful!
The mapping and assignment of identifiers has not been so systematic, given the incompleteness and sometimes questionable reliability of the many sources (previous GEMs, databases, etc.). This means that there is a lot of manual work involved which is neither efficient nor sustainable long-term, so we're still in the process of trying to automate as much of it as possible.
I believe most of the identifier information currently in Human-GEM (and its associated annotation …