Skip to content

Commit f8e2830

Browse files
authored
Merge pull request #42 from theochem/update_readme
Clean up README.md
2 parents e346a0c + bca33f6 commit f8e2830

File tree

1 file changed

+28
-56
lines changed

1 file changed

+28
-56
lines changed

README.md

Lines changed: 28 additions & 56 deletions
Original file line numberDiff line numberDiff line change
@@ -44,6 +44,30 @@ publisher = {Springer Nature}
4444

4545
## Usage
4646

47+
### Via PyPI
48+
49+
The B3DB dataset is avaliable at [PyPI](https://pypi.org/project/qc-B3DB/). One can install it using pip:
50+
51+
```bash
52+
pip install qc-B3DB
53+
```
54+
55+
Then load the data (dictionary of `pandas` dataframe) with the following code snippet:
56+
57+
```python
58+
59+
from B3DB import B3DB_DATA_DICT
60+
61+
# access the data via dictionary keys
62+
# 'B3DB_regression'
63+
# 'B3DB_regression_extended'
64+
# 'B3DB_classification'
65+
# 'B3DB_classification_extended'
66+
67+
```
68+
69+
### Manually Download the Data
70+
4771
There are two types of dataset in [B3DB](B3DB), [regression data](B3DB/B3DB_regression.tsv)
4872
and [classification data](B3DB/B3DB_classification.tsv) and they can be loaded simply using *pandas*. For example
4973

@@ -68,6 +92,8 @@ classification_data_extended = pd.read_csv("B3DB/B3DB_classification_extended.ts
6892

6993
```
7094

95+
### Examples in Jupyter Notebooks
96+
7197
We also have three examples to show how to use our dataset,
7298
[numerical_data_analysis.ipynb](notebooks/numerical_data_analysis.ipynb),
7399
[PCA_projection_fingerprint.ipynb](notebooks/PCA_projection_fingerprint.ipynb) and
@@ -79,63 +105,9 @@ using *MyBinder*,
79105
Due to the difficulty of installing `RDKit` in *MyBinder*, only `PCA_projection_descriptors.
80106
ipynb` is set up in *MyBinder*.
81107

82-
## Working environment setting up
83-
84-
All the calculations were performed in a Python 3.7.9 virtual environment created with Conda in
85-
CentOS Linux release 7.9.2009. The Conda environment includes the following Python packages,
86-
87-
- ChEMBL_Structure_Pipeline==1.0.0, https://github.com/chembl/ChEMBL_Structure_Pipeline/
88-
- RDKit==2020.09.1, https://www.rdkit.org/
89-
- openeye-toolkit==2020.2.0, https://docs.eyesopen.com/toolkits/python/index.html/
90-
- mordred==1.1.2, https://github.com/mordred-descriptor/mordred/ (required networkx==2.3.0)
91-
- numpy==1.19.2, https://numpy.org/
92-
- pandas==1.2.1, https://pandas.pydata.org/
93-
- pubchempy==1.0.4, https://github.com/mcs07/PubChemPy/
94-
- PyTDC==0.1.5, https://github.com/mims-harvard/TDC/
95-
- SciPy==1.10.0, https://www.scipy.org/
96-
- tabula-py==2.2.0, https://pypi.org/project/tabula-py/
97-
98-
To creat a virtual environment named *bbb_data* with `Python 3.7.9` to this specification, first,
99-
```bash
100-
conda create bbb_py37 python=3.7.9
101-
```
102-
Given that `RDKit`, `ChEMBL_Structure_Pipeline` are not available in PyPI and we will install
103-
them with `conda`,
104-
105-
```bash
106-
# activate a virtual environment
107-
conda activate bbb_py37
108-
109-
conda install -c rdkit rdkit=2020.09.1.0
110-
conda install -c conda-forge chembl_structure_pipeline=1.0.0
111-
# https://docs.eyesopen.com/toolkits/python/quickstart-python/linuxosx.html
112-
conda install -c openeye openeye-toolkits=2020.2.0
113-
```
114-
Then we can install the requirements in [requirements.txt](requirements.txt) with
115-
```bash
116-
pip install -r requirements.txt
117-
```
118-
119-
An easier way is to run the follow script with `bash`,
120-
121-
```bash
122-
#!/bin/bash
123-
124-
# create virtual environment
125-
conda create bbb_py37 python=3.7.9
126-
# activate virtual environment
127-
conda activate bbb_py37
128-
129-
# install required packages
130-
conda install -c rdkit rdkit=2020.09.1.0
131-
conda install -c conda-forge chembl_structure_pipeline=1.0.0
132-
# https://docs.eyesopen.com/toolkits/python/quickstart-python/linuxosx.html
133-
conda install -c openeye openeye-toolkits=2020.2.0
134-
135-
pip install -r requirements.txt
136-
```
108+
## Data Curation
137109

138-
`ALOGPS` version 2.1 can be accessed at http://www.vcclab.org/lab/alogps/.
110+
Detailed procedures for data curation can be found in [data curation section](data_curation/) in this repository.
139111

140112
The materials and data under this repo are distributed under the
141113
[CC0 Licence](http://creativecommons.org/publicdomain/zero/1.0/).

0 commit comments

Comments
 (0)