@@ -44,6 +44,30 @@ publisher = {Springer Nature}
4444
4545## Usage
4646
47+ ### Via PyPI
48+
49+ The B3DB dataset is avaliable at [ PyPI] ( https://pypi.org/project/qc-B3DB/ ) . One can install it using pip:
50+
51+ ``` bash
52+ pip install qc-B3DB
53+ ```
54+
55+ Then load the data (dictionary of ` pandas ` dataframe) with the following code snippet:
56+
57+ ``` python
58+
59+ from B3DB import B3DB_DATA_DICT
60+
61+ # access the data via dictionary keys
62+ # 'B3DB_regression'
63+ # 'B3DB_regression_extended'
64+ # 'B3DB_classification'
65+ # 'B3DB_classification_extended'
66+
67+ ```
68+
69+ ### Manually Download the Data
70+
4771There are two types of dataset in [ B3DB] ( B3DB ) , [ regression data] ( B3DB/B3DB_regression.tsv )
4872and [ classification data] ( B3DB/B3DB_classification.tsv ) and they can be loaded simply using * pandas* . For example
4973
@@ -68,6 +92,8 @@ classification_data_extended = pd.read_csv("B3DB/B3DB_classification_extended.ts
6892
6993```
7094
95+ ### Examples in Jupyter Notebooks
96+
7197We also have three examples to show how to use our dataset,
7298[ numerical_data_analysis.ipynb] ( notebooks/numerical_data_analysis.ipynb ) ,
7399[ PCA_projection_fingerprint.ipynb] ( notebooks/PCA_projection_fingerprint.ipynb ) and
@@ -79,63 +105,9 @@ using *MyBinder*,
79105Due to the difficulty of installing ` RDKit ` in * MyBinder* , only `PCA_projection_descriptors.
80106ipynb` is set up in * MyBinder* .
81107
82- ## Working environment setting up
83-
84- All the calculations were performed in a Python 3.7.9 virtual environment created with Conda in
85- CentOS Linux release 7.9.2009. The Conda environment includes the following Python packages,
86-
87- - ChEMBL_Structure_Pipeline==1.0.0, https://github.com/chembl/ChEMBL_Structure_Pipeline/
88- - RDKit==2020.09.1, https://www.rdkit.org/
89- - openeye-toolkit==2020.2.0, https://docs.eyesopen.com/toolkits/python/index.html/
90- - mordred==1.1.2, https://github.com/mordred-descriptor/mordred/ (required networkx==2.3.0)
91- - numpy==1.19.2, https://numpy.org/
92- - pandas==1.2.1, https://pandas.pydata.org/
93- - pubchempy==1.0.4, https://github.com/mcs07/PubChemPy/
94- - PyTDC==0.1.5, https://github.com/mims-harvard/TDC/
95- - SciPy==1.10.0, https://www.scipy.org/
96- - tabula-py==2.2.0, https://pypi.org/project/tabula-py/
97-
98- To creat a virtual environment named * bbb_data* with ` Python 3.7.9 ` to this specification, first,
99- ``` bash
100- conda create bbb_py37 python=3.7.9
101- ```
102- Given that ` RDKit ` , ` ChEMBL_Structure_Pipeline ` are not available in PyPI and we will install
103- them with ` conda ` ,
104-
105- ``` bash
106- # activate a virtual environment
107- conda activate bbb_py37
108-
109- conda install -c rdkit rdkit=2020.09.1.0
110- conda install -c conda-forge chembl_structure_pipeline=1.0.0
111- # https://docs.eyesopen.com/toolkits/python/quickstart-python/linuxosx.html
112- conda install -c openeye openeye-toolkits=2020.2.0
113- ```
114- Then we can install the requirements in [ requirements.txt] ( requirements.txt ) with
115- ``` bash
116- pip install -r requirements.txt
117- ```
118-
119- An easier way is to run the follow script with ` bash ` ,
120-
121- ``` bash
122- #! /bin/bash
123-
124- # create virtual environment
125- conda create bbb_py37 python=3.7.9
126- # activate virtual environment
127- conda activate bbb_py37
128-
129- # install required packages
130- conda install -c rdkit rdkit=2020.09.1.0
131- conda install -c conda-forge chembl_structure_pipeline=1.0.0
132- # https://docs.eyesopen.com/toolkits/python/quickstart-python/linuxosx.html
133- conda install -c openeye openeye-toolkits=2020.2.0
134-
135- pip install -r requirements.txt
136- ```
108+ ## Data Curation
137109
138- ` ALOGPS ` version 2.1 can be accessed at http://www.vcclab.org/lab/alogps/ .
110+ Detailed procedures for data curation can be found in [ data curation section ] ( data_curation/ ) in this repository .
139111
140112The materials and data under this repo are distributed under the
141113[ CC0 Licence] ( http://creativecommons.org/publicdomain/zero/1.0/ ) .
0 commit comments