Skip to content

Commit 2c1d63b

Browse files
authored
Merge pull request #43 from theochem/new_data
Add new external dataset
2 parents f8e2830 + a173413 commit 2c1d63b

File tree

2 files changed

+32
-4
lines changed

2 files changed

+32
-4
lines changed

B3DB/b3db.py

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -38,14 +38,13 @@ def load_b3db_dataset():
3838
classification_external_data = pd.read_csv(
3939
data_dir / "B3DB_classification_external.tsv", sep="\t"
4040
)
41-
# TODO: add classification_external_extended_data
4241

4342
return {
4443
"B3DB_regression": regression_data,
4544
"B3DB_classification": classification_data,
4645
"B3DB_regression_extended": regression_data_extended,
4746
"B3DB_classification_extended": classification_data_extended,
48-
# "B3DB_classification_external": classification_external_data,
47+
"B3DB_classification_external": classification_external_data,
4948
}
5049

5150

README.md

Lines changed: 31 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,12 +7,12 @@ the consistency between different experimental references/measurements. This dat
77

88
A subset of the
99
molecules in B3DB has numerical `logBB` values (1058 compounds), while the whole dataset
10-
has categorical (BBB+ or BBB-) BBB permeability labels (7807 compounds). Some physicochemical properties
10+
has categorical (BBB+ or BBB-) BBB permeability labels (7807 compounds prior to v1.0.0 and 7982 compounds after). Some physicochemical properties
1111
of the molecules are also provided.
1212

1313
## Citation
1414

15-
Please use the following citation in any publication using our *B3DB* dataset:
15+
Please use the following citations in any publication using our *B3DB* dataset:
1616

1717
```md
1818
@article{Meng_A_curated_diverse_2021,
@@ -26,6 +26,18 @@ year = {2021},
2626
url = {https://www.nature.com/articles/s41597-021-01069-5},
2727
publisher = {Springer Nature}
2828
}
29+
30+
@article{Meng_B3clf_2025,
31+
author = {Meng, Fanwang and Chen, Jitian and Collins-Ramirez, Juan Samuel and Ayers, Paul W.},
32+
doi = {xxx},
33+
journal = {xxx},
34+
number = {xxx},
35+
title = {B3clf: A Resampling-Integrated Machine Learning Framework to Predict Blood-Brain Barrier Permeability},
36+
volume = {x},
37+
year = {xxx},
38+
url = {xxx},
39+
publisher = {xxx}
40+
}
2941
```
3042

3143
## Features of *B3DB*
@@ -63,6 +75,17 @@ from B3DB import B3DB_DATA_DICT
6375
# 'B3DB_regression_extended'
6476
# 'B3DB_classification'
6577
# 'B3DB_classification_extended'
78+
# "B3DB_classification_external"
79+
df_b3db_reg = B3DB_DATA_DICT["B3DB_regression"]
80+
df_b3db_reg.head()
81+
# NO. compound_name ... group comments
82+
# 0 1 moxalactam ... A NaN
83+
# 1 2 schembl614298 ... A NaN
84+
# 2 3 morphine-6-glucuronide ... A NaN
85+
# 3 4 2-[4-(5-bromo-3-methylpyridin-2-yl)butylamino]... ... A NaN
86+
# 4 5 NaN ... A NaN
87+
88+
# [5 rows x 10 columns]
6689

6790
```
6891

@@ -111,3 +134,9 @@ Detailed procedures for data curation can be found in [data curation section](da
111134

112135
The materials and data under this repo are distributed under the
113136
[CC0 Licence](http://creativecommons.org/publicdomain/zero/1.0/).
137+
138+
## ChangeLog
139+
140+
- 2025Aug16, the B3DB dataset is avaliable via PyPI.
141+
- 2025Aug16, we have added a new set of 171 BBB+ and 4 BBB- compounds to the dataset since
142+
version 1.1.0.

0 commit comments

Comments
 (0)