11# ChEBai
22
3- ChEBai is a deep learning library that allows the combination of deep learning methods with chemical ontologies
4- (especially ChEBI). Special attention is given to the integration of the semantic qualities of the ontology into the learning process. This is done in two different ways:
3+ ChEBai is a deep learning library designed for the integration of deep learning methods with chemical ontologies, particularly ChEBI.
4+ The library emphasizes the incorporation of the semantic qualities of the ontology into the learning process.
55
6- ## Pretraining
6+ ## Installation
77
8+ To install ChEBai, follow these steps:
9+
10+ 1 . Clone the repository:
811```
9- python -m chebai fit --data.class_path=chebai.preprocessing.datasets.pubchem.SWJChem --model=configs/model/electra-for-pretraining.yml --trainer=configs/training/default_trainer.yml --trainer.callbacks=configs/training/default_callbacks.yml
12+ git clone https://github.com/ChEB-AI/python-chebai.git
1013```
1114
12- ## Structure-based ontology extension
15+ 2 . Install the package:
1316
1417```
15- python -m chebai fit --config=[path-to-your-electra_chebi100-config] --trainer.callbacks=configs/training/default_callbacks.yml --model.pretrained_checkpoint=[path-to-pretrained-model] --model.load_prefix=generator.
18+ cd python-chebai
19+ pip install .
1620```
1721
22+ ## Usage
1823
19- ## Fine-tuning for Toxicity prediction
24+ The training and inference is abstracted using the Pytorch Lightning modules.
25+ Here are some CLI commands for the standard functionalities of pretraining, ontology extension, fine-tuning for toxicity and prediction.
26+ For further details, see the [ wiki] ( https://github.com/ChEB-AI/python-chebai/wiki ) .
27+ If you face any problems, please open a new [ issue] ( https://github.com/ChEB-AI/python-chebai/issues/new ) .
2028
29+ ### Pretraining
2130```
22- python -m chebai fit --config=[path-to-your-tox21-config] --trainer.callbacks =configs/training/default_callbacks.yml --model.pretrained_checkpoint=[path-to-pretrained-model] --model.load_prefix=generator.
31+ python -m chebai fit --data.class_path=chebai.preprocessing.datasets.pubchem.PubchemChem --model =configs/model/electra-for-pretraining.yml --trainer=configs/training/pretraining_trainer.yml
2332```
2433
34+ ### Structure-based ontology extension
35+ ```
36+ python -m chebai fit --trainer=configs/training/default_trainer.yml --model=configs/model/electra.yml --model.pretrained_checkpoint=[path-to-pretrained-model] --model.load_prefix=generator. --data=[path-to-dataset-config] --model.out_dim=[number-of-labels]
2537```
26- python -m chebai train --config=[path-to-your-tox21-config] --trainer.callbacks=configs/training/default_callbacks.yml --ckpt_path=[path-to-model-with-ontology-pretraining]
38+ A command with additional options may look like this:
39+ ```
40+ python3 -m chebai fit --trainer=configs/training/default_trainer.yml --model=configs/model/electra.yml --model.train_metrics=configs/metrics/micro-macro-f1.yml --model.test_metrics=configs/metrics/micro-macro-f1.yml --model.val_metrics=configs/metrics/micro-macro-f1.yml --model.pretrained_checkpoint=electra_pretrained.ckpt --model.load_prefix=generator. --data=configs/data/chebi50.yml --model.out_dim=1446 --model.criterion=configs/loss/bce.yml --data.init_args.batch_size=10 --trainer.logger.init_args.name=chebi50_bce_unweighted --data.init_args.num_workers=9 --model.pass_loss_kwargs=false --data.init_args.chebi_version=231 --data.init_args.data_limit=1000
2741```
2842
29- ## Predicting classes given SMILES strings
43+ ### Fine-tuning for Toxicity prediction
44+ ```
45+ python -m chebai fit --config=[path-to-your-tox21-config] --trainer.callbacks=configs/training/default_callbacks.yml --model.pretrained_checkpoint=[path-to-pretrained-model] --model.load_prefix=generator.
46+ ```
3047
48+ ### Predicting classes given SMILES strings
3149```
3250python3 -m chebai predict_from_file --model=[path-to-model-config] --checkpoint_path=[path-to-model] --input_path={path-to-file-containing-smiles] [--classes_path=[path-to-classes-file]] [--save_to=[path-to-output]]
3351```
3452The input files should contain a list of line-separated SMILES strings. This generates a CSV file that contains the
3553one row for each SMILES string and one column for each class.
3654
55+ ## Evaluation
56+
57+ An example for evaluating a model trained on the ontology extension task is given in ` tutorials/eval_model_basic.ipynb ` .
58+ It takes in the finetuned model as input for performing the evaluation.
3759
3860## Cross-validation
3961You can do inner k-fold cross-validation, i.e., train models on k train-validation splits that all use the same test
@@ -46,31 +68,4 @@ and the fold to be used in the current optimisation run as
4668--data.init_args.fold_index=I
4769```
4870To train K models, you need to do K such calls, each with a different ` fold_index ` . On the first call with a given
49- ` inner_k_folds ` , all folds will be created and stored in the data directory
50-
51- ## Chebi versions
52- Change the chebi version used for all sets (default: 200):
53- ```
54- --data.init_args.chebi_version=VERSION
55- ```
56- To change only the version of the train and validation sets independently of the test set, use
57- ```
58- --data.init_args.chebi_version_train=VERSION
59- ```
60-
61- ## Data folder structure
62- Data is stored in and retrieved from the raw and processed folders
63- ```
64- data/${dataset_name}/${chebi_version}/raw/
65- ```
66- and
67- ```
68- data/${dataset_name}/${chebi_version}/processed/${reader_name}/
69- ```
70- where ` ${dataset_name} ` is the ` _name ` -attribute of the ` DataModule ` used,
71- ` ${chebi_version} ` refers to the ChEBI version used (only for ChEBI-datasets) and
72- ` ${reader_name} ` is the ` name ` -attribute of the ` Reader ` class associated with the dataset.
73-
74- For cross-validation, the folds are stored as ` cv_${n_folds}_fold/fold_{fold_index}_train.pkl `
75- and ` cv_${n_folds}_fold/fold_{fold_index}_validation.pkl ` in the raw directory.
76- In the processed directory, ` .pt ` is used instead of ` .pkl ` .
71+ ` inner_k_folds ` , all folds will be created and stored in the data directory
0 commit comments