Skip to content

Commit 1a7a8a0

Browse files
committed
Add tool page for CHAMOIS
1 parent c042586 commit 1a7a8a0

File tree

3 files changed

+50
-0
lines changed

3 files changed

+50
-0
lines changed
180 KB
Loading
45.3 KB
Loading

tools/_posts/2025-03-15-chamois.md

Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
---
2+
layout: tools
3+
title: "CHAMOIS"
4+
contributors: [mlarralde]
5+
handle: chamois
6+
status: complete
7+
type: software
8+
projects: [secmet]
9+
10+
# Optional
11+
website:
12+
publications: "https://www.biorxiv.org/content/10.1101/2025.03.13.642868v1"
13+
doi: "10.1101/2025.03.13.642868"
14+
image: /assets/images/tools/2025-03-15-chamois-icon.png
15+
tagline: Machine learning inference of natural product chemistry.
16+
tags: [bioinformatics, secondary_metabolism]
17+
18+
# Data and code
19+
github: ["https://github.com/zellerlab/CHAMOIS"]
20+
---
21+
{% include JB/setup %}
22+
23+
24+
## Abstract
25+
26+
Biosynthetic gene clusters (BGCs) are genomic loci encoding the biosynthetic pathway for producing specialised metabolites
27+
with a broad spectrum of bioactivities. Many methods have been developed for genome mining of BGCs, such as [GECCO](/tools/gecco).
28+
However, in many cases the cognate metabolite remains unknown, and experimental characterisation remains difficult.
29+
30+
CHAMOIS is a machine learning-based tool for predicting chemical properties of secondary metabolites from protein domains annotated in the input BGCs.
31+
CHAMOIS infers 539 chemical properties from the ChemOnt ontology using logistic regression. It accurately predicts 111 such properties (AUPRC > 0.5)
32+
in cross-validation against known instances. Although CHAMOIS is not explicitly trained on biosynthetic knowledge, many of the inferred
33+
links between protein domains and metabolite properties are consistent with scientific literature, others suggest new biochemical functions of
34+
uncharacterized biosynthetic domains. Finally, CHAMOIS can pinpoint which BGC within a given genome produces a pre-specified metabolite
35+
(correct BGC in 72% of cases ranked among the top 5), which holds great potential for prioritising experimental BGC characterisation and discovery
36+
of novel biosynthetic enzymes.
37+
38+
The CHAMOIS software is implemented in [Python](https://www.python.org/),
39+
supports all versions from [Python 3.7](https://endoflife.date/python) and is provided under
40+
the [GNU General Public License v3.0 or later](https://choosealicense.com/licenses/gpl-3.0/).
41+
42+
![CHAMOIS flowchart](/assets/images/tools/2025-03-15-chamois-flow.png)
43+
44+
Graphical depiction of the chemical hierarchy inference approach implemented in CHAMOIS. Briefly, CHAMOIS
45+
identifies open reading frames (ORFs) in a given BGC sequence (Step 1). Then, protein domains
46+
are annotated in the resulting ORFs using profile hidden Markov models (pHMMs; Step 2). The resulting domain
47+
vector for the whole BGC serves as a feature for a logistic regression classifier for each class of the
48+
ChemOnt ontology (Step 3). Predicted classes allow filtering for BGCs with particularly relevant chemical classes (Step 4).
49+
Finally, the fingerprint of class predictions can be used to find BGCs most similar to a particular compound (Step 5).
50+

0 commit comments

Comments
 (0)