Merge pull request #42 from theochem/update_readme

FanwangM · web-flow · commit f8e28307eeea · 2025-08-16T19:56:08.000-04:00
Clean up README.md
diff --git a/README.md b/README.md
@@ -44,6 +44,30 @@ publisher = {Springer Nature}
 
 ## Usage
 
+### Via PyPI
+
+The B3DB dataset is avaliable at [PyPI](https://pypi.org/project/qc-B3DB/). One can install it using pip:
+
+```bash
+pip install qc-B3DB
+```
+
+Then load the data (dictionary of `pandas` dataframe) with the following code snippet:
+
+```python
+
+from B3DB import B3DB_DATA_DICT
+
+# access the data via dictionary keys
+# 'B3DB_regression'
+# 'B3DB_regression_extended'
+# 'B3DB_classification'
+# 'B3DB_classification_extended'
+
+```
+
+### Manually Download the Data
+
 There are two types of dataset in [B3DB](B3DB), [regression data](B3DB/B3DB_regression.tsv)
 and [classification data](B3DB/B3DB_classification.tsv) and they can be loaded simply using *pandas*. For example
 
@@ -68,6 +92,8 @@ classification_data_extended = pd.read_csv("B3DB/B3DB_classification_extended.ts
 
 ```
 
+### Examples in Jupyter Notebooks
+
 We also have three examples to show how to use our dataset,
 [numerical_data_analysis.ipynb](notebooks/numerical_data_analysis.ipynb),
 [PCA_projection_fingerprint.ipynb](notebooks/PCA_projection_fingerprint.ipynb) and
@@ -79,63 +105,9 @@ using *MyBinder*,
 Due to the difficulty of installing `RDKit` in *MyBinder*, only `PCA_projection_descriptors.
 ipynb` is set up in *MyBinder*.
 
-## Working environment setting up
-
-All the calculations were performed in a Python 3.7.9 virtual environment created with Conda in
-CentOS Linux release 7.9.2009. The Conda environment includes the following Python packages,
-
-- ChEMBL_Structure_Pipeline==1.0.0, https://github.com/chembl/ChEMBL_Structure_Pipeline/
-- RDKit==2020.09.1, https://www.rdkit.org/
-- openeye-toolkit==2020.2.0, https://docs.eyesopen.com/toolkits/python/index.html/
-- mordred==1.1.2, https://github.com/mordred-descriptor/mordred/ (required networkx==2.3.0)
-- numpy==1.19.2, https://numpy.org/
-- pandas==1.2.1, https://pandas.pydata.org/
-- pubchempy==1.0.4, https://github.com/mcs07/PubChemPy/
-- PyTDC==0.1.5, https://github.com/mims-harvard/TDC/
-- SciPy==1.10.0, https://www.scipy.org/
-- tabula-py==2.2.0, https://pypi.org/project/tabula-py/
-
-To creat a virtual environment named *bbb_data* with `Python 3.7.9` to this specification, first,
-```bash
-conda create bbb_py37 python=3.7.9
-```
-Given that `RDKit`, `ChEMBL_Structure_Pipeline` are not available in PyPI and we will install
-them with `conda`,
-
-```bash
-# activate a virtual environment
-conda activate bbb_py37
-
-conda install -c rdkit rdkit=2020.09.1.0
-conda install -c conda-forge chembl_structure_pipeline=1.0.0
-# https://docs.eyesopen.com/toolkits/python/quickstart-python/linuxosx.html
-conda install -c openeye openeye-toolkits=2020.2.0
-```
-Then we can install the requirements in [requirements.txt](requirements.txt) with
-```bash
-pip install -r requirements.txt
-```
-
-An easier way is to run the follow script with `bash`,
-
-```bash
-#!/bin/bash
-
-# create virtual environment
-conda create bbb_py37 python=3.7.9
-# activate virtual environment
-conda activate bbb_py37
-
-# install required packages
-conda install -c rdkit rdkit=2020.09.1.0
-conda install -c conda-forge chembl_structure_pipeline=1.0.0
-# https://docs.eyesopen.com/toolkits/python/quickstart-python/linuxosx.html
-conda install -c openeye openeye-toolkits=2020.2.0
-
-pip install -r requirements.txt
-```
+## Data Curation
 
-`ALOGPS` version 2.1 can be accessed at http://www.vcclab.org/lab/alogps/.
+Detailed procedures for data curation can be found in [data curation section](data_curation/) in this repository.
 
 The materials and data under this repo are distributed under the
 [CC0 Licence](http://creativecommons.org/publicdomain/zero/1.0/).