|
| 1 | +--- |
| 2 | +layout: tools |
| 3 | +title: "CHAMOIS" |
| 4 | +contributors: [mlarralde] |
| 5 | +handle: chamois |
| 6 | +status: complete |
| 7 | +type: software |
| 8 | +projects: [secmet] |
| 9 | + |
| 10 | +# Optional |
| 11 | +website: |
| 12 | +publications: "https://www.biorxiv.org/content/10.1101/2025.03.13.642868v1" |
| 13 | +doi: "10.1101/2025.03.13.642868" |
| 14 | +image: /assets/images/tools/2025-03-15-chamois-icon.png |
| 15 | +tagline: Machine learning inference of natural product chemistry. |
| 16 | +tags: [bioinformatics, secondary_metabolism] |
| 17 | + |
| 18 | +# Data and code |
| 19 | +github: ["https://github.com/zellerlab/CHAMOIS"] |
| 20 | +--- |
| 21 | +{% include JB/setup %} |
| 22 | + |
| 23 | + |
| 24 | +## Abstract |
| 25 | + |
| 26 | +Biosynthetic gene clusters (BGCs) are genomic loci encoding the biosynthetic pathway for producing specialised metabolites |
| 27 | +with a broad spectrum of bioactivities. Many methods have been developed for genome mining of BGCs, such as [GECCO](/tools/gecco). |
| 28 | +However, in many cases the cognate metabolite remains unknown, and experimental characterisation remains difficult. |
| 29 | + |
| 30 | +CHAMOIS is a machine learning-based tool for predicting chemical properties of secondary metabolites from protein domains annotated in the input BGCs. |
| 31 | +CHAMOIS infers 539 chemical properties from the ChemOnt ontology using logistic regression. It accurately predicts 111 such properties (AUPRC > 0.5) |
| 32 | +in cross-validation against known instances. Although CHAMOIS is not explicitly trained on biosynthetic knowledge, many of the inferred |
| 33 | +links between protein domains and metabolite properties are consistent with scientific literature, others suggest new biochemical functions of |
| 34 | +uncharacterized biosynthetic domains. Finally, CHAMOIS can pinpoint which BGC within a given genome produces a pre-specified metabolite |
| 35 | +(correct BGC in 72% of cases ranked among the top 5), which holds great potential for prioritising experimental BGC characterisation and discovery |
| 36 | +of novel biosynthetic enzymes. |
| 37 | + |
| 38 | +The CHAMOIS software is implemented in [Python](https://www.python.org/), |
| 39 | +supports all versions from [Python 3.7](https://endoflife.date/python) and is provided under |
| 40 | +the [GNU General Public License v3.0 or later](https://choosealicense.com/licenses/gpl-3.0/). |
| 41 | + |
| 42 | + |
| 43 | + |
| 44 | +Graphical depiction of the chemical hierarchy inference approach implemented in CHAMOIS. Briefly, CHAMOIS |
| 45 | +identifies open reading frames (ORFs) in a given BGC sequence (Step 1). Then, protein domains |
| 46 | +are annotated in the resulting ORFs using profile hidden Markov models (pHMMs; Step 2). The resulting domain |
| 47 | +vector for the whole BGC serves as a feature for a logistic regression classifier for each class of the |
| 48 | +ChemOnt ontology (Step 3). Predicted classes allow filtering for BGCs with particularly relevant chemical classes (Step 4). |
| 49 | +Finally, the fingerprint of class predictions can be used to find BGCs most similar to a particular compound (Step 5). |
| 50 | + |
0 commit comments