Skip to content

Commit 52007ad

Browse files
authored
Merge pull request #4 from ChEB-AI/feature/detailed_readme
Detailed Readme
2 parents d28af63 + a4a486a commit 52007ad

File tree

2 files changed

+58
-31
lines changed

2 files changed

+58
-31
lines changed

README.md

Lines changed: 57 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -1,43 +1,73 @@
11

2+
# ChEB-AI Graph
3+
4+
Graph-based models for molecular property prediction and ontology classification, built on top of the [`python-chebai`](https://github.com/ChEB-AI/python-chebai) codebase.
5+
6+
27

38
## Installation
49

5-
Some requirements may not be installed successfully automatically.
6-
To install the `torch-` libraries, use
10+
To install this repository, download it and run
711

8-
`pip install torch-${lib} -f https://data.pyg.org/whl/torch-2.1.0+${CUDA}.html`
12+
```bash
13+
pip install .
14+
```
915

10-
where `${lib}` is either `scatter`, `geometric`, `sparse` or `cluster`, and
11-
`${CUDA}` is either `cpu`, `cu118` or `cu121` (depending on your system, see e.g.
12-
[torch-geometric docs](https://pytorch-geometric.readthedocs.io/en/latest/install/installation.html))
16+
The dependencies `torch`, `torch_geometric` and `torch-sparse` cannot be installed automatically.
1317

18+
Use the following command:
1419

15-
## Commands
20+
```bash
21+
pip install torch torch_scatter torch_geometric -f https://data.pyg.org/whl/torch-${TORCH}+${CUDA}.html
22+
```
1623

17-
For training, config files from the `python-chebai` and `python-chebai-graph` repositories can be combined. This requires that you download the [source code of python-chebai](https://github.com/ChEB-AI/python-chebai). Make sure that you are in the right folder and know the relative path to the other repository.
24+
Replace:
25+
- `${TORCH}` with your installed PyTorch version (e.g., `2.6.0`)
26+
- `${CUDA}` with e.g. `cpu`, `cu118`, or `cu121` depending on your system and CUDA version
1827

19-
We recommend the following setup:
28+
If you already have `torch` installed, make sure that `torch_scatter` and `torch_geometric` are compatible with your
29+
PyTorch version and are installed with the same CUDA version.
2030

21-
my_projects
22-
python-chebai
23-
chebai
24-
configs
25-
data
26-
...
27-
python-chebai-graph
28-
chebai_graph
29-
configs
30-
...
31+
For a full list of currently available PyTorch versions and CUDA compatibility, please refer to libraries' official documentation:
32+
- [torch](https://pytorch.org/get-started/locally/)
33+
- [torch_geometric](https://pytorch-geometric.readthedocs.io/en/latest/notes/installation.html#installation)
34+
- [torch-scatter](https://github.com/rusty1s/pytorch_scatter)
3135

32-
If you run the command from the `python-chebai` directory, you can use the same data for both chebai- and chebai-graph-models (e.g., Transformers and GNNs).
33-
Then you have to use `{path-to-chebai} -> .` and `{path-to-chebai-graph} -> ../python-chebai-graph`.
34-
35-
Pretraining on a atom / bond masking task with PubChem data (feature-branch):
36-
```
37-
python3 -m chebai fit --model={path-to-chebai-graph}/configs/model/gnn_resgated_pretrain.yml --data={path-to-chebai-graph}/configs/data/pubchem_graph.yml --trainer={path-to-chebai}/configs/training/pretraining_trainer.yml
36+
_Note for developers_: If you want to install the package in editable mode, use the following command instead:
37+
38+
```bash
39+
pip install -e .
3840
```
3941

40-
Training on the ontology prediction task (here for ChEBI50, v231, 200 epochs)
42+
## Recommended Folder Structure
43+
44+
ChEB-AI Graph is not a standalone library. Instead, it provides additional models and datasets for [`python-chebai`](https://github.com/ChEB-AI/python-chebai).
45+
The training relies on config files that are located either in `python-chebai` or in this repository.
46+
47+
Therefore, for training, we recommend to clone both repositories into a common parent directory. For instance, your project can look like this:
48+
49+
```
50+
my_projects/
51+
├── python-chebai/
52+
│ ├── chebai/
53+
│ ├── configs/
54+
│ └── ...
55+
└── python-chebai-graph/
56+
├── chebai_graph/
57+
├── configs/
58+
└── ...
4159
```
42-
python3 -m chebai fit --trainer={path-to-chebai}/configs/training/default_trainer.yml --trainer.callbacks={path-to-chebai}/configs/training/default_callbacks.yml --model={path-to-chebai-graph}/configs/model/gnn_res_gated.yml --model.train_metrics={path-to-chebai}/configs/metrics/micro-macro-f1.yml --model.test_metrics={path-to-chebai}/configs/metrics/micro-macro-f1.yml --model.val_metrics={path-to-chebai}/configs/metrics/micro-macro-f1.yml --data={path-to-chebai-graph}/configs/data/chebi50_graph_properties.yml --model.criterion=c{path-to-chebai}/onfigs/loss/bce.yml --data.init_args.batch_size=40 --trainer.logger.init_args.name=chebi50_bce_unweighted_resgatedgraph --data.init_args.num_workers=12 --model.pass_loss_kwargs=false --data.init_args.chebi_version=231 --trainer.min_epochs=200 --trainer.max_epochs=200
60+
61+
## Training & Pretraining
62+
63+
### Ontology Prediction
64+
65+
66+
This example command trains a Residual Gated Graph Convolutional Network on the ChEBI50 dataset (see [wiki](https://github.com/ChEB-AI/python-chebai/wiki/Data-Management)).
67+
The dataset has a customizable list of properties for atoms, bonds and molecules that are added to the graph.
68+
The list can be found in the `configs/data/chebi50_graph_properties.yml` file.
69+
70+
```bash
71+
python -m chebai fit --trainer=configs/training/default_trainer.yml --trainer.logger=configs/training/csv_logger.yml --model=../python-chebai-graph/configs/model/gnn_res_gated.yml --model.train_metrics=configs/metrics/micro-macro-f1.yml --model.test_metrics=configs/metrics/micro-macro-f1.yml --model.val_metrics=configs/metrics/micro-macro-f1.yml --data=../python-chebai-graph/configs/data/chebi50_graph_properties.yml --data.init_args.batch_size=128 --trainer.accumulate_grad_batches=4 --data.init_args.num_workers=10 --model.pass_loss_kwargs=false --data.init_args.chebi_version=241 --trainer.min_epochs=200 --trainer.max_epochs=200 --model.criterion=configs/loss/bce.yml
4372
```
73+

pyproject.toml

Lines changed: 1 addition & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -6,10 +6,7 @@ authors = [
66
{ name = "Martin Glauer", email = "[email protected]" }
77
]
88
dependencies = [
9-
"torch_geometric",
10-
"torch-scatter",
11-
"torch-sparse",
12-
"torch-cluster",
9+
"chebai",
1310
"descriptastorus"
1411
]
1512

0 commit comments

Comments
 (0)