Skip to content

Commit 9769e23

Browse files
committed
update readme
1 parent 572f0ed commit 9769e23

File tree

1 file changed

+46
-15
lines changed

1 file changed

+46
-15
lines changed

README.md

Lines changed: 46 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,17 @@
11
# python-chebifier
2-
An AI ensemble model for predicting chemical classes in the ChEBI ontology.
2+
An AI ensemble model for predicting chemical classes in the ChEBI ontology. It integrates deep learning models,
3+
rule-based models and generative AI-based models.
4+
5+
A web application for the ensemble is available at https://chebifier.hastingslab.org/.
36

47
## Installation
58

9+
You can get the package from PyPI:
10+
```bash
11+
pip install chebifier
12+
```
13+
14+
or get the latest development version from GitHub:
615
```bash
716
# Clone the repository
817
git clone https://github.com/yourusername/python-chebifier.git
@@ -12,7 +21,7 @@ cd python-chebifier
1221
pip install -e .
1322
```
1423

15-
u`chebai-graph` and its dependencies cannot be installed automatically. If you want to use Graph Neural Networks, follow
24+
`chebai-graph` and its dependencies cannot be installed automatically. If you want to use Graph Neural Networks, follow
1625
the instructions in the [chebai-graph repository](https://github.com/ChEB-AI/python-chebai-graph).
1726

1827
## Usage
@@ -21,23 +30,24 @@ the instructions in the [chebai-graph repository](https://github.com/ChEB-AI/pyt
2130

2231
The package provides a command-line interface (CLI) for making predictions using an ensemble model.
2332

24-
```bash
25-
# Get help
26-
python -m chebifier.cli --help
33+
The ensemble configuration is given by a configuration file (by default, this is `chebifier/ensemble.yml`). If you
34+
want to change which models are included in the ensemble or how they are weighted, you can create your own configuration file.
2735

28-
# Make predictions using a configuration file
29-
python -m chebifier.cli predict configs/example_config.yml --smiles "CC(=O)OC1=CC=CC=C1C(=O)O" "C1=CC=C(C=C1)C(=O)O"
36+
Model weights for deep learning models are downloaded automatically from [Hugging Face](https://huggingface.co/chebai).
37+
However, you can also supply your own model checkpoints (see `configs/example_config.yml` for an example).
3038

31-
# Make predictions using SMILES from a file
32-
python -m chebifier.cli predict configs/example_config.yml --smiles-file smiles.txt
33-
```
39+
```bash
40+
# Make predictions
41+
python -m chebifier predict --smiles "CC(=O)OC1=CC=CC=C1C(=O)O" --smiles "C1=CC=C(C=C1)C(=O)O"
3442

35-
### Configuration File
43+
# Make predictions using SMILES from a file
44+
python -m chebifier predict --smiles-file smiles.txt
3645

37-
The CLI requires a YAML configuration file that defines the ensemble model. An example can be found in `configs/example_config.yml`.
46+
# Make predictions using a configuration file
47+
python -m chebifier predict --ensemble-config configs/my_config.yml --smiles-file smiles.txt
3848

39-
The models and other required files are trained / generated by our [chebai](https://github.com/ChEB-AI/python-chebai) package.
40-
Examples for models can be found on [kaggle](https://www.kaggle.com/datasets/sfluegel/chebai).
49+
python -m chebifier predict --help
50+
```
4151

4252
### Python API
4353

@@ -67,6 +77,27 @@ for smiles, prediction in zip(smiles_list, predictions):
6777
print("No predictions")
6878
```
6979

80+
### The models
81+
Currently, the following models are supported:
82+
83+
84+
| Model | Description | #Classes | Publication | Repository |
85+
|-------|-------------|----------|-----------------------------------------------------------------------|----------------------------------------------------------------------------------------|
86+
| `electra` | A transformer-based deep learning model trained on ChEBI SMILES strings. | 1522 | [Glauer, Martin, et al., 2024: Chebifier: Automating semantic classification in ChEBI to accelerate data-driven discovery, Digital Discovery 3 (2024) 896-907](https://pubs.rsc.org/en/content/articlehtml/2024/dd/d3dd00238a) | [python-chebai](https://github.com/ChEB-AI/python-chebai) |
87+
| `resgated` | A Residual Gated Graph Convolutional Network trained on ChEBI molecules. | 1522 | | [python-chebai-graph](https://github.com/ChEB-AI/python-chebai-graph) |
88+
| `chemlog_peptides` | A rule-based model specialised on peptide classes. | 18 | [Flügel, Simon, et al., 2025: ChemLog: Making MSOL Viable for Ontological Classification and Learning, arXiv](https://arxiv.org/abs/2507.13987) | [chemlog-peptides](https://github.com/sfluegel05/chemlog-peptides) |
89+
| `chemlog_element`, `chemlog_organox` | Extensions of ChemLog for classes that are defined either by the presence of a specific element or by the presence of an organic bond. | 118 + 37 | | [chemlog-extra](https://github.com/ChEB-AI/chemlog-extra) |
90+
| `c3p` | A collection _Chemical Classifier Programs_, generated by LLMs based on the natural language definitions of ChEBI classes. | 338 | [Mungall, Christopher J., et al., 2025: Chemical classification program synthesis using generative artificial intelligence, arXiv](https://arxiv.org/abs/2505.18470) | [c3p](https://github.com/chemkg/c3p) |
91+
92+
In addition, Chebifier also includes a ChEBI lookup that automatically retrieves the ChEBI superclasses for a class
93+
matched by a SMILES string. This is not activated by default, but can be included by adding
94+
```yaml
95+
chebi_lookup:
96+
type: chebi_lookup
97+
model_weight: 10 # optional
98+
```
99+
to your configuration file.
100+
70101
### The ensemble
71102
72103
Given a sample (i.e., a SMILES string) and models $m_1, m_2, \ldots, m_n$, the ensemble works as follows:
@@ -110,7 +141,7 @@ belongs to the direct and indirect superclasses (e.g., primary alcohol, aromatic
110141
- (2) Next, we check for disjointness. This is not specified directly in ChEBI, but in an additional ChEBI module ([chebi-disjoints.owl](https://ftp.ebi.ac.uk/pub/databases/chebi/ontology/)).
111142
We have extracted these disjointness axioms into a CSV file and added some more disjointness axioms ourselves (see
112143
`data>disjoint_chebi.csv` and `data>disjoint_additional.csv`). If two classes $A$ and $B$ are disjoint and we predict
113-
both, we select one of them randomly and set the other to 0.
144+
both, we select one with the higher class score and set the other to 0.
114145
- (3) Since the second step might have introduced new inconsistencies into the hierarchy, we repeat the first step, but
115146
with a small change. For a pair of classes $A \subseteq B$ with predictions $1$ and $0$, instead of setting $B$ to $1$,
116147
we now set $A$ to $0$. This has the advantage that we cannot introduce new disjointness-inconsistencies and don't have

0 commit comments

Comments
 (0)