update readme

sfluegel05 · sfluegel05 · commit 9769e23d59ef · 2025-07-21T20:54:22.000+02:00
diff --git a/README.md b/README.md
@@ -1,8 +1,17 @@
 # python-chebifier
-An AI ensemble model for predicting chemical classes in the ChEBI ontology.
+An AI ensemble model for predicting chemical classes in the ChEBI ontology. It integrates deep learning models,
+rule-based models and generative AI-based models.
+
+A web application for the ensemble is available at https://chebifier.hastingslab.org/.
 
 ## Installation
 
+You can get the package from PyPI:
+```bash
+pip install chebifier
+```
+
+or get the latest development version from GitHub:
 ```bash
 # Clone the repository
 git clone https://github.com/yourusername/python-chebifier.git
@@ -12,7 +21,7 @@ cd python-chebifier
 pip install -e .
 ```
 
-u`chebai-graph` and its dependencies cannot be installed automatically. If you want to use Graph Neural Networks, follow
+`chebai-graph` and its dependencies cannot be installed automatically. If you want to use Graph Neural Networks, follow
 the instructions in the [chebai-graph repository](https://github.com/ChEB-AI/python-chebai-graph).
 
 ## Usage
@@ -21,23 +30,24 @@ the instructions in the [chebai-graph repository](https://github.com/ChEB-AI/pyt
 
 The package provides a command-line interface (CLI) for making predictions using an ensemble model.
 
-```bash
-# Get help
-python -m chebifier.cli --help
+The ensemble configuration is given by a configuration file (by default, this is `chebifier/ensemble.yml`). If you
+want to change which models are included in the ensemble or how they are weighted, you can create your own configuration file.
 
-# Make predictions using a configuration file
-python -m chebifier.cli predict configs/example_config.yml --smiles "CC(=O)OC1=CC=CC=C1C(=O)O" "C1=CC=C(C=C1)C(=O)O"
+Model weights for deep learning models are downloaded automatically from [Hugging Face](https://huggingface.co/chebai).
+However, you can also supply your own model checkpoints (see `configs/example_config.yml` for an example).
 
-# Make predictions using SMILES from a file
-python -m chebifier.cli predict configs/example_config.yml --smiles-file smiles.txt
-```
+```bash
+# Make predictions 
+python -m chebifier predict --smiles "CC(=O)OC1=CC=CC=C1C(=O)O" --smiles "C1=CC=C(C=C1)C(=O)O"
 
-### Configuration File
+# Make predictions using SMILES from a file
+python -m chebifier predict --smiles-file smiles.txt
 
-The CLI requires a YAML configuration file that defines the ensemble model. An example can be found in `configs/example_config.yml`.
+# Make predictions using a configuration file
+python -m chebifier predict --ensemble-config configs/my_config.yml --smiles-file smiles.txt
 
-The models and other required files are trained / generated by our [chebai](https://github.com/ChEB-AI/python-chebai) package. 
-Examples for models can be found on [kaggle](https://www.kaggle.com/datasets/sfluegel/chebai).
+python -m chebifier predict --help
+```
 
 ### Python API
 
@@ -67,6 +77,27 @@ for smiles, prediction in zip(smiles_list, predictions):
         print("No predictions")
 ```
 
+### The models
+Currently, the following models are supported:
+
+
+| Model | Description | #Classes | Publication                                                           | Repository                                                                            |
+|-------|-------------|----------|-----------------------------------------------------------------------|----------------------------------------------------------------------------------------|
+| `electra` | A transformer-based deep learning model trained on ChEBI SMILES strings. | 1522 | [Glauer, Martin, et al., 2024: Chebifier: Automating semantic classification in ChEBI to accelerate data-driven discovery, Digital Discovery 3 (2024) 896-907](https://pubs.rsc.org/en/content/articlehtml/2024/dd/d3dd00238a) | [python-chebai](https://github.com/ChEB-AI/python-chebai) |
+| `resgated` | A Residual Gated Graph Convolutional Network trained on ChEBI molecules. | 1522 | | [python-chebai-graph](https://github.com/ChEB-AI/python-chebai-graph) |
+| `chemlog_peptides` | A rule-based model specialised on peptide classes. | 18 | [Flügel, Simon, et al., 2025: ChemLog: Making MSOL Viable for Ontological Classification and Learning, arXiv](https://arxiv.org/abs/2507.13987) | [chemlog-peptides](https://github.com/sfluegel05/chemlog-peptides) |
+| `chemlog_element`, `chemlog_organox` | Extensions of ChemLog for classes that are defined either by the presence of a specific element or by the presence of an organic bond. | 118 + 37 | | [chemlog-extra](https://github.com/ChEB-AI/chemlog-extra) |
+| `c3p` | A collection _Chemical Classifier Programs_, generated by LLMs based on the natural language definitions of ChEBI classes. | 338 | [Mungall, Christopher J., et al., 2025: Chemical classification program synthesis using generative artificial intelligence, arXiv](https://arxiv.org/abs/2505.18470) | [c3p](https://github.com/chemkg/c3p) |
+
+In addition, Chebifier also includes a ChEBI lookup that automatically retrieves the ChEBI superclasses for a class
+matched by a SMILES string. This is not activated by default, but can be included by adding 
+```yaml
+chebi_lookup:
+    type: chebi_lookup
+    model_weight: 10 # optional
+```
+to your configuration file.
+
 ### The ensemble
 
 Given a sample (i.e., a SMILES string) and models $m_1, m_2, \ldots, m_n$, the ensemble works as follows:
@@ -110,7 +141,7 @@ belongs to the direct and indirect superclasses (e.g., primary alcohol, aromatic
 - (2) Next, we check for disjointness. This is not specified directly in ChEBI, but in an additional ChEBI module ([chebi-disjoints.owl](https://ftp.ebi.ac.uk/pub/databases/chebi/ontology/)).
 We have extracted these disjointness axioms into a CSV file and added some more disjointness axioms ourselves (see
 `data>disjoint_chebi.csv` and `data>disjoint_additional.csv`). If two classes $A$ and $B$ are disjoint and we predict
-both, we select one of them randomly and set the other to 0.
+both, we select one with the higher class score and set the other to 0.
 - (3) Since the second step might have introduced new inconsistencies into the hierarchy, we repeat the first step, but 
 with a small change. For a pair of classes $A \subseteq B$ with predictions $1$ and $0$, instead of setting $B$ to $1$,
 we now set $A$ to $0$. This has the advantage that we cannot introduce new disjointness-inconsistencies and don't have