11# ChEBai
22
3- ChEBai is a deep learning library designed for the integration of deep learning methods with chemical ontologies, particularly ChEBI.
3+ ChEBai is a deep learning library designed for the integration of deep learning methods with chemical ontologies, particularly ChEBI.
44The library emphasizes the incorporation of the semantic qualities of the ontology into the learning process.
55
6+ ## Note for developers
7+
8+ If you have used ChEBai before PR #39 , the file structure in which your ChEBI-data is saved has changed. This means that
9+ datasets will be freshly generated. The data however is the same. If you want to keep the old data (including the old
10+ splits), you can use a migration script. It copies the old data to the new location for a specific ChEBI class
11+ (including chebi version and other parameters). The script can be called by specifying the data module from a config
12+ ```
13+ python chebai/preprocessing/migration/chebi_data_migration.py migrate --datamodule=[path-to-data-config]
14+ ```
15+ or by specifying the class name (e.g. ` ChEBIOver50 ` ) and arguments separately
16+ ```
17+ python chebai/preprocessing/migration/chebi_data_migration.py migrate --class_name=[data-class] [--chebi_version=[version]]
18+ ```
19+ The new dataset will by default generate random data splits (with a given seed).
20+ To reuse a fixed data split, you have to provide the path of the csv file generated during the migration:
21+ ` --data.init_args.splits_file_path=[path-to-processed_data]/splits.csv `
22+
623## Installation
724
825To install ChEBai, follow these steps:
@@ -21,7 +38,7 @@ pip install .
2138
2239## Usage
2340
24- The training and inference is abstracted using the Pytorch Lightning modules.
41+ The training and inference is abstracted using the Pytorch Lightning modules.
2542Here are some CLI commands for the standard functionalities of pretraining, ontology extension, fine-tuning for toxicity and prediction.
2643For further details, see the [ wiki] ( https://github.com/ChEB-AI/python-chebai/wiki ) .
2744If you face any problems, please open a new [ issue] ( https://github.com/ChEB-AI/python-chebai/issues/new ) .
@@ -55,18 +72,18 @@ The `classes_path` is the path to the dataset's `raw/classes.txt` file that cont
5572
5673## Evaluation
5774
58- An example for evaluating a model trained on the ontology extension task is given in ` tutorials/eval_model_basic.ipynb ` .
75+ An example for evaluating a model trained on the ontology extension task is given in ` tutorials/eval_model_basic.ipynb ` .
5976It takes in the finetuned model as input for performing the evaluation.
6077
6178## Cross-validation
62- You can do inner k-fold cross-validation, i.e., train models on k train-validation splits that all use the same test
79+ You can do inner k-fold cross-validation, i.e., train models on k train-validation splits that all use the same test
6380set. For that, you need to specify the total_number of folds as
6481```
6582--data.init_args.inner_k_folds=K
6683```
6784and the fold to be used in the current optimisation run as
68- ```
85+ ```
6986--data.init_args.fold_index=I
7087```
71- To train K models, you need to do K such calls, each with a different ` fold_index ` . On the first call with a given
88+ To train K models, you need to do K such calls, each with a different ` fold_index ` . On the first call with a given
7289` inner_k_folds ` , all folds will be created and stored in the data directory
0 commit comments