Inquiry about the general workflow of how the cross-linking of identifiers and chemical information assigned in humanGEM #370

gmhhope · 2022-03-02T23:01:19Z

gmhhope
Mar 2, 2022

Dear Human-GEM developers,

I am very happy to see the constant updates of Human-GEM model and thank you for making this model available! I am currently compiling the model for our usage. What I try to understand is how you assign all the crosslinking identifiers and accurately assign the chemical information.

If I understand correctly and based on the paper, most of the cross-linking of compound identifiers comes from MetNetX. Is that correct?
I do see some orphan identifiers coming from VMH (e.g., 'vmhmetabolite': 'CE7087'). Do you know where can I find the full tables of these vmh metabolites and their formula and charge information?
How the formula and charge were assigned? Do they mostly come from MetNetX chemProp.tsv?

I am sorry that I didn't follow your format. Hope you can help!

Thanks,
Minghao Gong

Answered by JonathanRob

Mar 3, 2022

Hi @gmhhope, thank you for your comments, we're glad you find the resource useful!

The mapping and assignment of identifiers has not been so systematic, given the incompleteness and sometimes questionable reliability of the many sources (previous GEMs, databases, etc.). This means that there is a lot of manual work involved which is neither efficient nor sustainable long-term, so we're still in the process of trying to automate as much of it as possible.

If I understand correctly and based on the paper, most of the cross-linking of compound identifiers comes from MetNetX. Is that correct?

I believe most of the identifier information currently in Human-GEM (and its associated annotation …

View full answer

JonathanRob · 2022-03-03T07:40:51Z

JonathanRob
Mar 3, 2022
Maintainer

Hi @gmhhope, thank you for your comments, we're glad you find the resource useful!

The mapping and assignment of identifiers has not been so systematic, given the incompleteness and sometimes questionable reliability of the many sources (previous GEMs, databases, etc.). This means that there is a lot of manual work involved which is neither efficient nor sustainable long-term, so we're still in the process of trying to automate as much of it as possible.

If I understand correctly and based on the paper, most of the cross-linking of compound identifiers comes from MetNetX. Is that correct?

I believe most of the identifier information currently in Human-GEM (and its associated annotation files) actually comes from its predecessor models (HMR, Recon, and iHsa), during a very careful and involved merging process. We did attempt to map as many identifiers to MetaNetX as it was one of the most exhaustive and comprehensive resources we found, and this was in turn used to retrieve some additional identifiers and chemical information, and resolve some conflicts.

The plan moving forward is to use the ChEBI ID as the "true source" when possible, and eventually use that to retrieve cross-linked identifiers and chemical properties. But of course not all of the model metabolites yet (or will ever) exist in the ChEBI database, so MetaNetX mapping will also continue to be a high priority.

I do see some orphan identifiers coming from VMH (e.g., 'vmhmetabolite': 'CE7087'). Do you know where can I find the full tables of these vmh metabolites and their formula and charge information?

Great question - I'm actually not sure how to obtain some sort of flatfile (e.g., csv) that includes the VMH metabolite identifiers and associations/chemical information from the VMH website. Maybe @mihai-sysbio knows?

How the formula and charge were assigned? Do they mostly come from MetNetX chemProp.tsv?

Most originate from the preceding models. We also did an extensive manual curation of metabolite formulas and charges after generating Human1 to reconcile mass/charge imbalances and address problematic cases such as polymerization and highly combinatorial processes (e.g., lipid metabolism). This unfortunately means that some of the formulas may not agree with some databases, but usually those cases are for quite large/complex metabolites. The assumed pH/protonation state is also a major factor contributing to differences in formula/charge among different sources, further complicated a more automated handling of this issue.

10 replies

gmhhope Mar 8, 2022
Author

Thanks very much! @mihai-sysbio @haowang-bioinfo,

All your suggestions and comments are very helpful information!

@mihai-sysbio, From the examples I looked at, for every Recon3D-sourced metabolite in Human-GEM there is an associated HMDB identifier that contains the expected metabolite details, at least on their website. I have looked into your notebook and I appreciate a lot for that! I found out that in your script, because of the try clause, only those 478 entries with valid HMDB field are retained. Otherwise, it will not retain. And when I remove that, definitely a large amount of those don't have HMDB identifiers. But thanks for trying that out! I really appreciate it!

Also, you refer to a website that may be relevant, could you share me the link?

I am very happy to be involved in such an active community. Hope I can contribute soon!

Thanks a lot,
Minghao Gong

gmhhope Mar 8, 2022
Author

Also, can I assume that the the list of metabolites and their identifiers in https://github.com/SysBioChalmers/Human-GEM/blob/main/model/Human-GEM.xml will be always synchronized with https://github.com/SysBioChalmers/Human-GEM/blob/main/model/metabolites.tsv

Thanks,
MInghao Gong

haowang-bioinfo Mar 9, 2022

Also, can I assume that the the list of metabolites and their identifiers in https://github.com/SysBioChalmers/Human-GEM/blob/main/model/Human-GEM.xml will be always synchronized with https://github.com/SysBioChalmers/Human-GEM/blob/main/model/metabolites.tsv

yes, indeed

smoretti Mar 17, 2022

@gmhhope some VMH compound information can be found here: https://github.com/opencobra/ctf but not everything I guess

ChEBI does provides the Monoisotopic Mass (e.g. https://www.ebi.ac.uk/chebi/searchId.do?chebiId=169172). It should be available in the ChEBI OWL or OBO download files: https://ftp.ebi.ac.uk/pub/databases/chebi/ontology/

gmhhope Mar 21, 2022
Author

Thanks @smoretti !

The information is indeed uncharted territory for me. Thanks for providing such valuable information!

Best,
Minghao

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Inquiry about the general workflow of how the cross-linking of identifiers and chemical information assigned in humanGEM #370

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment 10 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Inquiry about the general workflow of how the cross-linking of identifiers and chemical information assigned in humanGEM #370

Uh oh!

Uh oh!

gmhhope Mar 2, 2022

Replies: 1 comment · 10 replies

Uh oh!

JonathanRob Mar 3, 2022 Maintainer

Uh oh!

gmhhope Mar 8, 2022 Author

Uh oh!

Uh oh!

gmhhope Mar 8, 2022 Author

Uh oh!

haowang-bioinfo Mar 9, 2022

Uh oh!

smoretti Mar 17, 2022

Uh oh!

gmhhope Mar 21, 2022 Author

gmhhope
Mar 2, 2022

Replies: 1 comment 10 replies

JonathanRob
Mar 3, 2022
Maintainer

gmhhope Mar 8, 2022
Author

gmhhope Mar 8, 2022
Author

gmhhope Mar 21, 2022
Author