Skip to content

Commit 81df748

Browse files
authored
Merge pull request #18 from ChEB-AI/tutorial
Modifed README.md, tutorials folder
2 parents c06b058 + 95324de commit 81df748

File tree

5 files changed

+201
-39
lines changed

5 files changed

+201
-39
lines changed

README.md

Lines changed: 33 additions & 38 deletions
Original file line numberDiff line numberDiff line change
@@ -1,39 +1,61 @@
11
# ChEBai
22

3-
ChEBai is a deep learning library that allows the combination of deep learning methods with chemical ontologies
4-
(especially ChEBI). Special attention is given to the integration of the semantic qualities of the ontology into the learning process. This is done in two different ways:
3+
ChEBai is a deep learning library designed for the integration of deep learning methods with chemical ontologies, particularly ChEBI.
4+
The library emphasizes the incorporation of the semantic qualities of the ontology into the learning process.
55

6-
## Pretraining
6+
## Installation
77

8+
To install ChEBai, follow these steps:
9+
10+
1. Clone the repository:
811
```
9-
python -m chebai fit --data.class_path=chebai.preprocessing.datasets.pubchem.SWJChem --model=configs/model/electra-for-pretraining.yml --trainer=configs/training/default_trainer.yml --trainer.callbacks=configs/training/default_callbacks.yml
12+
git clone https://github.com/ChEB-AI/python-chebai.git
1013
```
1114

12-
## Structure-based ontology extension
15+
2. Install the package:
1316

1417
```
15-
python -m chebai fit --config=[path-to-your-electra_chebi100-config] --trainer.callbacks=configs/training/default_callbacks.yml --model.pretrained_checkpoint=[path-to-pretrained-model] --model.load_prefix=generator.
18+
cd python-chebai
19+
pip install .
1620
```
1721

22+
## Usage
1823

19-
## Fine-tuning for Toxicity prediction
24+
The training and inference is abstracted using the Pytorch Lightning modules.
25+
Here are some CLI commands for the standard functionalities of pretraining, ontology extension, fine-tuning for toxicity and prediction.
26+
For further details, see the [wiki](https://github.com/ChEB-AI/python-chebai/wiki).
27+
If you face any problems, please open a new [issue](https://github.com/ChEB-AI/python-chebai/issues/new).
2028

29+
### Pretraining
2130
```
22-
python -m chebai fit --config=[path-to-your-tox21-config] --trainer.callbacks=configs/training/default_callbacks.yml --model.pretrained_checkpoint=[path-to-pretrained-model] --model.load_prefix=generator.
31+
python -m chebai fit --data.class_path=chebai.preprocessing.datasets.pubchem.PubchemChem --model=configs/model/electra-for-pretraining.yml --trainer=configs/training/pretraining_trainer.yml
2332
```
2433

34+
### Structure-based ontology extension
35+
```
36+
python -m chebai fit --trainer=configs/training/default_trainer.yml --model=configs/model/electra.yml --model.pretrained_checkpoint=[path-to-pretrained-model] --model.load_prefix=generator. --data=[path-to-dataset-config] --model.out_dim=[number-of-labels]
2537
```
26-
python -m chebai train --config=[path-to-your-tox21-config] --trainer.callbacks=configs/training/default_callbacks.yml --ckpt_path=[path-to-model-with-ontology-pretraining]
38+
A command with additional options may look like this:
39+
```
40+
python3 -m chebai fit --trainer=configs/training/default_trainer.yml --model=configs/model/electra.yml --model.train_metrics=configs/metrics/micro-macro-f1.yml --model.test_metrics=configs/metrics/micro-macro-f1.yml --model.val_metrics=configs/metrics/micro-macro-f1.yml --model.pretrained_checkpoint=electra_pretrained.ckpt --model.load_prefix=generator. --data=configs/data/chebi50.yml --model.out_dim=1446 --model.criterion=configs/loss/bce.yml --data.init_args.batch_size=10 --trainer.logger.init_args.name=chebi50_bce_unweighted --data.init_args.num_workers=9 --model.pass_loss_kwargs=false --data.init_args.chebi_version=231 --data.init_args.data_limit=1000
2741
```
2842

29-
## Predicting classes given SMILES strings
43+
### Fine-tuning for Toxicity prediction
44+
```
45+
python -m chebai fit --config=[path-to-your-tox21-config] --trainer.callbacks=configs/training/default_callbacks.yml --model.pretrained_checkpoint=[path-to-pretrained-model] --model.load_prefix=generator.
46+
```
3047

48+
### Predicting classes given SMILES strings
3149
```
3250
python3 -m chebai predict_from_file --model=[path-to-model-config] --checkpoint_path=[path-to-model] --input_path={path-to-file-containing-smiles] [--classes_path=[path-to-classes-file]] [--save_to=[path-to-output]]
3351
```
3452
The input files should contain a list of line-separated SMILES strings. This generates a CSV file that contains the
3553
one row for each SMILES string and one column for each class.
3654

55+
## Evaluation
56+
57+
An example for evaluating a model trained on the ontology extension task is given in `tutorials/eval_model_basic.ipynb`.
58+
It takes in the finetuned model as input for performing the evaluation.
3759

3860
## Cross-validation
3961
You can do inner k-fold cross-validation, i.e., train models on k train-validation splits that all use the same test
@@ -46,31 +68,4 @@ and the fold to be used in the current optimisation run as
4668
--data.init_args.fold_index=I
4769
```
4870
To train K models, you need to do K such calls, each with a different `fold_index`. On the first call with a given
49-
`inner_k_folds`, all folds will be created and stored in the data directory
50-
51-
## Chebi versions
52-
Change the chebi version used for all sets (default: 200):
53-
```
54-
--data.init_args.chebi_version=VERSION
55-
```
56-
To change only the version of the train and validation sets independently of the test set, use
57-
```
58-
--data.init_args.chebi_version_train=VERSION
59-
```
60-
61-
## Data folder structure
62-
Data is stored in and retrieved from the raw and processed folders
63-
```
64-
data/${dataset_name}/${chebi_version}/raw/
65-
```
66-
and
67-
```
68-
data/${dataset_name}/${chebi_version}/processed/${reader_name}/
69-
```
70-
where `${dataset_name}` is the `_name`-attribute of the `DataModule` used,
71-
`${chebi_version}` refers to the ChEBI version used (only for ChEBI-datasets) and
72-
`${reader_name}` is the `name`-attribute of the `Reader` class associated with the dataset.
73-
74-
For cross-validation, the folds are stored as `cv_${n_folds}_fold/fold_{fold_index}_train.pkl`
75-
and `cv_${n_folds}_fold/fold_{fold_index}_validation.pkl` in the raw directory.
76-
In the processed directory, `.pt` is used instead of `.pkl`.
71+
`inner_k_folds`, all folds will be created and stored in the data directory

chebai/result/classification.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@
1616
from chebai.models import ChebaiBaseNet
1717
from chebai.models.electra import Electra
1818
from chebai.preprocessing.datasets import XYBaseDataModule
19-
from utils import *
19+
from chebai.result.utils import *
2020

2121

2222
def visualise_f1(logs_path):

0 commit comments

Comments
 (0)