|
1 | 1 |
|
| 2 | +# ChEB-AI Graph |
| 3 | + |
| 4 | +Graph-based models for molecular property prediction and ontology classification, built on top of the [`python-chebai`](https://github.com/ChEB-AI/python-chebai) codebase. |
| 5 | + |
| 6 | + |
2 | 7 |
|
3 | 8 | ## Installation |
4 | 9 |
|
5 | | -Some requirements may not be installed successfully automatically. |
6 | | -To install the `torch-` libraries, use |
| 10 | +To install this repository, download it and run |
7 | 11 |
|
8 | | -`pip install torch-${lib} -f https://data.pyg.org/whl/torch-2.1.0+${CUDA}.html` |
| 12 | +```bash |
| 13 | +pip install . |
| 14 | +``` |
9 | 15 |
|
10 | | -where `${lib}` is either `scatter`, `geometric`, `sparse` or `cluster`, and |
11 | | -`${CUDA}` is either `cpu`, `cu118` or `cu121` (depending on your system, see e.g. |
12 | | -[torch-geometric docs](https://pytorch-geometric.readthedocs.io/en/latest/install/installation.html)) |
| 16 | +The dependencies `torch`, `torch_geometric` and `torch-sparse` cannot be installed automatically. |
13 | 17 |
|
| 18 | +Use the following command: |
14 | 19 |
|
15 | | -## Commands |
| 20 | +```bash |
| 21 | +pip install torch torch_scatter torch_geometric -f https://data.pyg.org/whl/torch-${TORCH}+${CUDA}.html |
| 22 | +``` |
16 | 23 |
|
17 | | -For training, config files from the `python-chebai` and `python-chebai-graph` repositories can be combined. This requires that you download the [source code of python-chebai](https://github.com/ChEB-AI/python-chebai). Make sure that you are in the right folder and know the relative path to the other repository. |
| 24 | +Replace: |
| 25 | +- `${TORCH}` with your installed PyTorch version (e.g., `2.6.0`) |
| 26 | +- `${CUDA}` with e.g. `cpu`, `cu118`, or `cu121` depending on your system and CUDA version |
18 | 27 |
|
19 | | -We recommend the following setup: |
| 28 | +If you already have `torch` installed, make sure that `torch_scatter` and `torch_geometric` are compatible with your |
| 29 | +PyTorch version and are installed with the same CUDA version. |
20 | 30 |
|
21 | | - my_projects |
22 | | - python-chebai |
23 | | - chebai |
24 | | - configs |
25 | | - data |
26 | | - ... |
27 | | - python-chebai-graph |
28 | | - chebai_graph |
29 | | - configs |
30 | | - ... |
| 31 | +For a full list of currently available PyTorch versions and CUDA compatibility, please refer to libraries' official documentation: |
| 32 | +- [torch](https://pytorch.org/get-started/locally/) |
| 33 | +- [torch_geometric](https://pytorch-geometric.readthedocs.io/en/latest/notes/installation.html#installation) |
| 34 | +- [torch-scatter](https://github.com/rusty1s/pytorch_scatter) |
31 | 35 |
|
32 | | - If you run the command from the `python-chebai` directory, you can use the same data for both chebai- and chebai-graph-models (e.g., Transformers and GNNs). |
33 | | - Then you have to use `{path-to-chebai} -> .` and `{path-to-chebai-graph} -> ../python-chebai-graph`. |
34 | | - |
35 | | -Pretraining on a atom / bond masking task with PubChem data (feature-branch): |
36 | | -``` |
37 | | -python3 -m chebai fit --model={path-to-chebai-graph}/configs/model/gnn_resgated_pretrain.yml --data={path-to-chebai-graph}/configs/data/pubchem_graph.yml --trainer={path-to-chebai}/configs/training/pretraining_trainer.yml |
| 36 | +_Note for developers_: If you want to install the package in editable mode, use the following command instead: |
| 37 | + |
| 38 | +```bash |
| 39 | +pip install -e . |
38 | 40 | ``` |
39 | 41 |
|
40 | | -Training on the ontology prediction task (here for ChEBI50, v231, 200 epochs) |
| 42 | +## Recommended Folder Structure |
| 43 | + |
| 44 | +ChEB-AI Graph is not a standalone library. Instead, it provides additional models and datasets for [`python-chebai`](https://github.com/ChEB-AI/python-chebai). |
| 45 | +The training relies on config files that are located either in `python-chebai` or in this repository. |
| 46 | + |
| 47 | +Therefore, for training, we recommend to clone both repositories into a common parent directory. For instance, your project can look like this: |
| 48 | + |
| 49 | +``` |
| 50 | +my_projects/ |
| 51 | +├── python-chebai/ |
| 52 | +│ ├── chebai/ |
| 53 | +│ ├── configs/ |
| 54 | +│ └── ... |
| 55 | +└── python-chebai-graph/ |
| 56 | + ├── chebai_graph/ |
| 57 | + ├── configs/ |
| 58 | + └── ... |
41 | 59 | ``` |
42 | | -python3 -m chebai fit --trainer={path-to-chebai}/configs/training/default_trainer.yml --trainer.callbacks={path-to-chebai}/configs/training/default_callbacks.yml --model={path-to-chebai-graph}/configs/model/gnn_res_gated.yml --model.train_metrics={path-to-chebai}/configs/metrics/micro-macro-f1.yml --model.test_metrics={path-to-chebai}/configs/metrics/micro-macro-f1.yml --model.val_metrics={path-to-chebai}/configs/metrics/micro-macro-f1.yml --data={path-to-chebai-graph}/configs/data/chebi50_graph_properties.yml --model.criterion=c{path-to-chebai}/onfigs/loss/bce.yml --data.init_args.batch_size=40 --trainer.logger.init_args.name=chebi50_bce_unweighted_resgatedgraph --data.init_args.num_workers=12 --model.pass_loss_kwargs=false --data.init_args.chebi_version=231 --trainer.min_epochs=200 --trainer.max_epochs=200 |
| 60 | + |
| 61 | +## Training & Pretraining |
| 62 | + |
| 63 | +### Ontology Prediction |
| 64 | + |
| 65 | + |
| 66 | +This example command trains a Residual Gated Graph Convolutional Network on the ChEBI50 dataset (see [wiki](https://github.com/ChEB-AI/python-chebai/wiki/Data-Management)). |
| 67 | +The dataset has a customizable list of properties for atoms, bonds and molecules that are added to the graph. |
| 68 | +The list can be found in the `configs/data/chebi50_graph_properties.yml` file. |
| 69 | + |
| 70 | +```bash |
| 71 | +python -m chebai fit --trainer=configs/training/default_trainer.yml --trainer.logger=configs/training/csv_logger.yml --model=../python-chebai-graph/configs/model/gnn_res_gated.yml --model.train_metrics=configs/metrics/micro-macro-f1.yml --model.test_metrics=configs/metrics/micro-macro-f1.yml --model.val_metrics=configs/metrics/micro-macro-f1.yml --data=../python-chebai-graph/configs/data/chebi50_graph_properties.yml --data.init_args.batch_size=128 --trainer.accumulate_grad_batches=4 --data.init_args.num_workers=10 --model.pass_loss_kwargs=false --data.init_args.chebi_version=241 --trainer.min_epochs=200 --trainer.max_epochs=200 --model.criterion=configs/loss/bce.yml |
43 | 72 | ``` |
| 73 | + |
0 commit comments