You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
|`electra`| A transformer-based deep learning model trained on ChEBI SMILES strings. | 1522 |[Glauer, Martin, et al., 2024: Chebifier: Automating semantic classification in ChEBI to accelerate data-driven discovery, Digital Discovery 3 (2024) 896-907](https://pubs.rsc.org/en/content/articlehtml/2024/dd/d3dd00238a)|[python-chebai](https://github.com/ChEB-AI/python-chebai)|
87
+
|`resgated`| A Residual Gated Graph Convolutional Network trained on ChEBI molecules. | 1522 ||[python-chebai-graph](https://github.com/ChEB-AI/python-chebai-graph)|
88
+
|`chemlog_peptides`| A rule-based model specialised on peptide classes. | 18 |[Flügel, Simon, et al., 2025: ChemLog: Making MSOL Viable for Ontological Classification and Learning, arXiv](https://arxiv.org/abs/2507.13987)|[chemlog-peptides](https://github.com/sfluegel05/chemlog-peptides)|
89
+
|`chemlog_element`, `chemlog_organox`| Extensions of ChemLog for classes that are defined either by the presence of a specific element or by the presence of an organic bond. | 118 + 37 ||[chemlog-extra](https://github.com/ChEB-AI/chemlog-extra)|
90
+
|`c3p`| A collection _Chemical Classifier Programs_, generated by LLMs based on the natural language definitions of ChEBI classes. | 338 |[Mungall, Christopher J., et al., 2025: Chemical classification program synthesis using generative artificial intelligence, arXiv](https://arxiv.org/abs/2505.18470)|[c3p](https://github.com/chemkg/c3p)|
91
+
92
+
In addition, Chebifier also includes a ChEBI lookup that automatically retrieves the ChEBI superclasses for a class
93
+
matched by a SMILES string. This is not activated by default, but can be included by adding
94
+
```yaml
95
+
chebi_lookup:
96
+
type: chebi_lookup
97
+
model_weight: 10# optional
98
+
```
99
+
to your configuration file.
100
+
70
101
### The ensemble
71
102
72
103
Given a sample (i.e., a SMILES string) and models $m_1, m_2, \ldots, m_n$, the ensemble works as follows:
@@ -110,7 +141,7 @@ belongs to the direct and indirect superclasses (e.g., primary alcohol, aromatic
110
141
- (2) Next, we check for disjointness. This is not specified directly in ChEBI, but in an additional ChEBI module ([chebi-disjoints.owl](https://ftp.ebi.ac.uk/pub/databases/chebi/ontology/)).
111
142
We have extracted these disjointness axioms into a CSV file and added some more disjointness axioms ourselves (see
112
143
`data>disjoint_chebi.csv`and `data>disjoint_additional.csv`). If two classes $A$ and $B$ are disjoint and we predict
113
-
both, we select one of them randomly and set the other to 0.
144
+
both, we select one with the higher class score and set the other to 0.
114
145
- (3) Since the second step might have introduced new inconsistencies into the hierarchy, we repeat the first step, but
115
146
with a small change. For a pair of classes $A \subseteq B$ with predictions $1$ and $0$, instead of setting $B$ to $1$,
116
147
we now set $A$ to $0$. This has the advantage that we cannot introduce new disjointness-inconsistencies and don't have
0 commit comments