Merge pull request #4 from ChEB-AI/feature/detailed_readme

sfluegel05 · web-flow · commit 52007ad8d49e · 2025-05-30T17:58:30.000+02:00
Detailed Readme
diff --git a/README.md b/README.md
@@ -1,43 +1,73 @@
 
+# ChEB-AI Graph
+
+Graph-based models for molecular property prediction and ontology classification, built on top of the [`python-chebai`](https://github.com/ChEB-AI/python-chebai) codebase.
+
+
 
 ## Installation
 
-Some requirements may not be installed successfully automatically. 
-To install the `torch-` libraries, use
+To install this repository, download it and run 
 
-`pip install torch-${lib} -f https://data.pyg.org/whl/torch-2.1.0+${CUDA}.html`
+```bash
+pip install .
+```
 
-where `${lib}` is either `scatter`, `geometric`, `sparse` or `cluster`, and
-`${CUDA}` is either `cpu`, `cu118` or `cu121` (depending on your system, see e.g. 
-[torch-geometric docs](https://pytorch-geometric.readthedocs.io/en/latest/install/installation.html))
+The dependencies `torch`, `torch_geometric` and `torch-sparse` cannot be installed automatically.
 
+Use the following command:
 
-## Commands
+```bash
+pip install torch torch_scatter torch_geometric -f https://data.pyg.org/whl/torch-${TORCH}+${CUDA}.html
+```
 
-For training, config files from the `python-chebai` and `python-chebai-graph` repositories can be combined. This requires that you download the [source code of python-chebai](https://github.com/ChEB-AI/python-chebai). Make sure that you are in the right folder and know the relative path to the other repository.
+Replace:
+- `${TORCH}` with your installed PyTorch version (e.g., `2.6.0`)
+- `${CUDA}` with e.g. `cpu`, `cu118`, or `cu121` depending on your system and CUDA version
 
-We recommend the following setup:
+If you already have `torch` installed, make sure that `torch_scatter` and `torch_geometric` are compatible with your 
+PyTorch version and are installed with the same CUDA version.
 
-    my_projects
-      python-chebai
-        chebai
-        configs
-        data
-        ...
-      python-chebai-graph
-        chebai_graph
-        configs
-        ...
+For a full list of currently available PyTorch versions and CUDA compatibility, please refer to libraries' official documentation:
+- [torch](https://pytorch.org/get-started/locally/)
+- [torch_geometric](https://pytorch-geometric.readthedocs.io/en/latest/notes/installation.html#installation)
+- [torch-scatter](https://github.com/rusty1s/pytorch_scatter)
 
-  If you run the command from the `python-chebai` directory, you can use the same data for both chebai- and chebai-graph-models (e.g., Transformers and GNNs).
-  Then you have to use `{path-to-chebai} -> .` and `{path-to-chebai-graph} -> ../python-chebai-graph`.
-      
-Pretraining on a atom / bond masking task with PubChem data (feature-branch):
-```
-python3 -m chebai fit --model={path-to-chebai-graph}/configs/model/gnn_resgated_pretrain.yml --data={path-to-chebai-graph}/configs/data/pubchem_graph.yml --trainer={path-to-chebai}/configs/training/pretraining_trainer.yml
+_Note for developers_: If you want to install the package in editable mode, use the following command instead:
+
+```bash
+pip install -e .
 ```
 
-Training on the ontology prediction task (here for ChEBI50, v231, 200 epochs)
+## Recommended Folder Structure
+
+ChEB-AI Graph is not a standalone library. Instead, it provides additional models and datasets for [`python-chebai`](https://github.com/ChEB-AI/python-chebai).
+The training relies on config files that are located either in `python-chebai` or in this repository.
+
+Therefore, for training, we recommend to clone both repositories into a common parent directory. For instance, your project can look like this:
+
+```
+my_projects/
+├── python-chebai/
+│   ├── chebai/
+│   ├── configs/
+│   └── ...
+└── python-chebai-graph/
+    ├── chebai_graph/
+    ├── configs/
+    └── ...
 ```
-python3 -m chebai fit --trainer={path-to-chebai}/configs/training/default_trainer.yml --trainer.callbacks={path-to-chebai}/configs/training/default_callbacks.yml --model={path-to-chebai-graph}/configs/model/gnn_res_gated.yml --model.train_metrics={path-to-chebai}/configs/metrics/micro-macro-f1.yml --model.test_metrics={path-to-chebai}/configs/metrics/micro-macro-f1.yml --model.val_metrics={path-to-chebai}/configs/metrics/micro-macro-f1.yml --data={path-to-chebai-graph}/configs/data/chebi50_graph_properties.yml --model.criterion=c{path-to-chebai}/onfigs/loss/bce.yml --data.init_args.batch_size=40 --trainer.logger.init_args.name=chebi50_bce_unweighted_resgatedgraph --data.init_args.num_workers=12 --model.pass_loss_kwargs=false --data.init_args.chebi_version=231 --trainer.min_epochs=200 --trainer.max_epochs=200
+
+## Training & Pretraining
+
+### Ontology Prediction
+
+
+This example command trains a Residual Gated Graph Convolutional Network on the ChEBI50 dataset (see [wiki](https://github.com/ChEB-AI/python-chebai/wiki/Data-Management)). 
+The dataset has a customizable list of properties for atoms, bonds and molecules that are added to the graph. 
+The list can be found in the `configs/data/chebi50_graph_properties.yml` file.
+
+```bash
+python -m chebai fit --trainer=configs/training/default_trainer.yml --trainer.logger=configs/training/csv_logger.yml --model=../python-chebai-graph/configs/model/gnn_res_gated.yml --model.train_metrics=configs/metrics/micro-macro-f1.yml --model.test_metrics=configs/metrics/micro-macro-f1.yml --model.val_metrics=configs/metrics/micro-macro-f1.yml --data=../python-chebai-graph/configs/data/chebi50_graph_properties.yml --data.init_args.batch_size=128 --trainer.accumulate_grad_batches=4 --data.init_args.num_workers=10 --model.pass_loss_kwargs=false --data.init_args.chebi_version=241 --trainer.min_epochs=200 --trainer.max_epochs=200 --model.criterion=configs/loss/bce.yml
 ```
+
diff --git a/pyproject.toml b/pyproject.toml
@@ -6,10 +6,7 @@ authors = [
     { name = "Martin Glauer", email = "martin.glauer@ovgu.de" }
 ]
 dependencies = [
-    "torch_geometric",
-    "torch-scatter",
-    "torch-sparse",
-    "torch-cluster",
+    "chebai",
     "descriptastorus"
 ]
 

Original file line number	Diff line number	Diff line change
`@@ -6,10 +6,7 @@ authors = [`
`6`	`6`	`{ name = "Martin Glauer", email = "[email protected]" }`
`7`	`7`	`]`
`8`	`8`	`dependencies = [`
`9`		`- "torch_geometric",`
`10`		`- "torch-scatter",`
`11`		`- "torch-sparse",`
`12`		`- "torch-cluster",`
	`9`	`+ "chebai",`
`13`	`10`	`"descriptastorus"`
`14`	`11`	`]`
`15`	`12`