Skip to content

Commit c6ed442

Browse files
committed
add additional data and model files for SSD
1 parent 042226a commit c6ed442

File tree

9 files changed

+357424
-1
lines changed

9 files changed

+357424
-1
lines changed

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
.DS_Store

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ This is a data repository associated with the manuscript titled "Enhancing Predi
44
This repository consists of:
55
- A novel small molecule solubility database (for solutes in 1-3 solvents)
66
- Code to train multicomponent solubility models (for solutes in 1-3 solvents)
7-
- Code to perform semi-supervised distillation (for up to student 5 with thresholds of 0.3 and 1)
7+
- Code to perform semi-supervised distillation (for up to student 5 with thresholds of 0.3 and 1 / Here, we are providing a subset of COSMO-RS generated database. Contact us for the full data.)
88

99
## Using this Repository
1010

data/DGsolv_cosmors_241007_sinbinter_random_1m.csv

Lines changed: 299585 additions & 0 deletions
Large diffs are not rendered by default.

model_files/Teacher_ShuffleSplit_Fold0/all_training_metrics.log

Lines changed: 1001 additions & 0 deletions
Large diffs are not rendered by default.
42.7 MB
Binary file not shown.

model_files/Teacher_ShuffleSplit_Fold0/kfold_0.csv

Lines changed: 56790 additions & 0 deletions
Large diffs are not rendered by default.
Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
Model train start time: 2024-10-09 10:57:03.673754
2+
Keyword args:
3+
{'model_name_in': 'Teacher_ShuffleSplit_Fold0', 'train_data': <PrefetchDataset shapes: ({atom_solute: (None, None), bond_solute: (None, None), connectivity_solute: (None, None, 2), mol_features_solute: (None, 5), atom_solvent1: (None, None), bond_solvent1: (None, None), connectivity_solvent1: (None, None, 2), ratio_solvent1: (None, None), mol_features_solvent1: (None, 5), atom_solvent2: (None, None), bond_solvent2: (None, None), connectivity_solvent2: (None, None, 2), ratio_solvent2: (None, None), mol_features_solvent2: (None, 5), atom_solvent3: (None, None), bond_solvent3: (None, None), connectivity_solvent3: (None, None, 2), ratio_solvent3: (None, None), mol_features_solvent3: (None, 5), connectivity_edges: (None, None, 2), weight_edges: (None, None, 4), temp_val: (None, None), num_solvents: (None, None)}, (None,), (None,)), types: ({atom_solute: tf.int32, bond_solute: tf.int32, connectivity_solute: tf.int32, mol_features_solute: tf.float32, atom_solvent1: tf.int32, bond_solvent1: tf.int32, connectivity_solvent1: tf.int32, ratio_solvent1: tf.float32, mol_features_solvent1: tf.float32, atom_solvent2: tf.int32, bond_solvent2: tf.int32, connectivity_solvent2: tf.int32, ratio_solvent2: tf.float32, mol_features_solvent2: tf.float32, atom_solvent3: tf.int32, bond_solvent3: tf.int32, connectivity_solvent3: tf.int32, ratio_solvent3: tf.float32, mol_features_solvent3: tf.float32, connectivity_edges: tf.int32, weight_edges: tf.float32, temp_val: tf.float32, num_solvents: tf.float32}, tf.float32, tf.float32)>, 'valid_data': <PrefetchDataset shapes: ({atom_solute: (None, None), bond_solute: (None, None), connectivity_solute: (None, None, 2), mol_features_solute: (None, 5), atom_solvent1: (None, None), bond_solvent1: (None, None), connectivity_solvent1: (None, None, 2), ratio_solvent1: (None, None), mol_features_solvent1: (None, 5), atom_solvent2: (None, None), bond_solvent2: (None, None), connectivity_solvent2: (None, None, 2), ratio_solvent2: (None, None), mol_features_solvent2: (None, 5), atom_solvent3: (None, None), bond_solvent3: (None, None), connectivity_solvent3: (None, None, 2), ratio_solvent3: (None, None), mol_features_solvent3: (None, 5), connectivity_edges: (None, None, 2), weight_edges: (None, None, 4), temp_val: (None, None), num_solvents: (None, None)}, (None,), (None,)), types: ({atom_solute: tf.int32, bond_solute: tf.int32, connectivity_solute: tf.int32, mol_features_solute: tf.float32, atom_solvent1: tf.int32, bond_solvent1: tf.int32, connectivity_solvent1: tf.int32, ratio_solvent1: tf.float32, mol_features_solvent1: tf.float32, atom_solvent2: tf.int32, bond_solvent2: tf.int32, connectivity_solvent2: tf.int32, ratio_solvent2: tf.float32, mol_features_solvent2: tf.float32, atom_solvent3: tf.int32, bond_solvent3: tf.int32, connectivity_solvent3: tf.int32, ratio_solvent3: tf.float32, mol_features_solvent3: tf.float32, connectivity_edges: tf.int32, weight_edges: tf.float32, temp_val: tf.float32, num_solvents: tf.float32}, tf.float32, tf.float32)>, 'test_data': <PrefetchDataset shapes: ({atom_solute: (None, None), bond_solute: (None, None), connectivity_solute: (None, None, 2), mol_features_solute: (None, 5), atom_solvent1: (None, None), bond_solvent1: (None, None), connectivity_solvent1: (None, None, 2), ratio_solvent1: (None, None), mol_features_solvent1: (None, 5), atom_solvent2: (None, None), bond_solvent2: (None, None), connectivity_solvent2: (None, None, 2), ratio_solvent2: (None, None), mol_features_solvent2: (None, 5), atom_solvent3: (None, None), bond_solvent3: (None, None), connectivity_solvent3: (None, None, 2), ratio_solvent3: (None, None), mol_features_solvent3: (None, 5), connectivity_edges: (None, None, 2), weight_edges: (None, None, 4), temp_val: (None, None), num_solvents: (None, None)}, (None,), (None,)), types: ({atom_solute: tf.int32, bond_solute: tf.int32, connectivity_solute: tf.int32, mol_features_solute: tf.float32, atom_solvent1: tf.int32, bond_solvent1: tf.int32, connectivity_solvent1: tf.int32, ratio_solvent1: tf.float32, mol_features_solvent1: tf.float32, atom_solvent2: tf.int32, bond_solvent2: tf.int32, connectivity_solvent2: tf.int32, ratio_solvent2: tf.float32, mol_features_solvent2: tf.float32, atom_solvent3: tf.int32, bond_solvent3: tf.int32, connectivity_solvent3: tf.int32, ratio_solvent3: tf.float32, mol_features_solvent3: tf.float32, connectivity_edges: tf.int32, weight_edges: tf.float32, temp_val: tf.float32, num_solvents: tf.float32}, tf.float32, tf.float32)>, 'train_df': index can_smiles_solute ... Train/Valid/Test solvent_system
4+
0 0 O=C(O)c1ccccc1 ... Train C/C/O
5+
2 2 O=C(O)c1ccccc1 ... Train C/CO/O
6+
4 4 O=C(O)c1ccccc1 ... Train C/CO/O
7+
5 5 O=C(O)c1ccccc1 ... Train C/C/CO
8+
6 6 Clc1cc(Cl)c(-c2ccccc2)c(Cl)c1 ... Train C/C/O
9+
... ... ... ... ... ...
10+
56781 56781 COC(=O)Nc1nc2ccccc2[nH]1 ... Train C/C/CCCO
11+
56782 56782 COC(=O)Nc1nc2ccccc2[nH]1 ... Train C/C/CCCO
12+
56783 56783 COC(=O)Nc1nc2ccccc2[nH]1 ... Train C/C/CCCO
13+
56787 56787 COC(=O)Nc1nc2ccccc2[nH]1 ... Train C/C/CCCO
14+
56788 56788 COC(=O)Nc1nc2ccccc2[nH]1 ... Train C/C/CCCO
15+
16+
[40888 rows x 16 columns], 'valid_df': index ... solvent_system
17+
1 1 ... C/CO/O
18+
3 3 ... C/CO/O
19+
14 14 ... C/CO/O
20+
19 19 ... C/CO/O
21+
20 20 ... C/CO/O
22+
... ... ... ...
23+
56775 56775 ... C/C/CCO
24+
56776 56776 ... C/C/CCO
25+
56778 56778 ... C/C/CCO
26+
56785 56785 ... C/C/CCCO
27+
56786 56786 ... C/C/CCCO
28+
29+
[10222 rows x 16 columns], 'test_df': index ... solvent_system
30+
17 17 ... C/CO/O
31+
18 18 ... C/CO/O
32+
24 24 ... C/CO/O
33+
28 28 ... C/CO/O
34+
50 50 ... C/CO/O
35+
... ... ... ...
36+
56716 56716 ... C/CO/O
37+
56746 56746 ... C/CN(C)C=O/O
38+
56765 56765 ... C/C/CO
39+
56773 56773 ... C/C/CCO
40+
56784 56784 ... C/C/CCCO
41+
42+
[5679 rows x 16 columns], 'preprocessor': <ssd_functions.CustomPreprocessor_NFPx2_ternary object at 0x14a63013bc70>, 'output_signature': ({'atom_solute': TensorSpec(shape=(None,), dtype=tf.int32, name=None), 'bond_solute': TensorSpec(shape=(None,), dtype=tf.int32, name=None), 'connectivity_solute': TensorSpec(shape=(None, 2), dtype=tf.int32, name=None), 'mol_features_solute': TensorSpec(shape=(5,), dtype=tf.float32, name=None), 'atom_solvent1': TensorSpec(shape=(None,), dtype=tf.int32, name=None), 'bond_solvent1': TensorSpec(shape=(None,), dtype=tf.int32, name=None), 'connectivity_solvent1': TensorSpec(shape=(None, 2), dtype=tf.int32, name=None), 'ratio_solvent1': TensorSpec(shape=(None,), dtype=tf.float32, name=None), 'mol_features_solvent1': TensorSpec(shape=(5,), dtype=tf.float32, name=None), 'atom_solvent2': TensorSpec(shape=(None,), dtype=tf.int32, name=None), 'bond_solvent2': TensorSpec(shape=(None,), dtype=tf.int32, name=None), 'connectivity_solvent2': TensorSpec(shape=(None, 2), dtype=tf.int32, name=None), 'ratio_solvent2': TensorSpec(shape=(None,), dtype=tf.float32, name=None), 'mol_features_solvent2': TensorSpec(shape=(5,), dtype=tf.float32, name=None), 'atom_solvent3': TensorSpec(shape=(None,), dtype=tf.int32, name=None), 'bond_solvent3': TensorSpec(shape=(None,), dtype=tf.int32, name=None), 'connectivity_solvent3': TensorSpec(shape=(None, 2), dtype=tf.int32, name=None), 'ratio_solvent3': TensorSpec(shape=(None,), dtype=tf.float32, name=None), 'mol_features_solvent3': TensorSpec(shape=(5,), dtype=tf.float32, name=None), 'connectivity_edges': TensorSpec(shape=(None, 2), dtype=tf.int32, name=None), 'weight_edges': TensorSpec(shape=(None, 4), dtype=tf.float32, name=None), 'temp_val': TensorSpec(shape=(None,), dtype=tf.float32, name=None), 'num_solvents': TensorSpec(shape=(None,), dtype=tf.float32, name=None)}, TensorSpec(shape=(), dtype=tf.float32, name=None), TensorSpec(shape=(), dtype=tf.float32, name=None)), 'batch_size': 1000, 'sample_weight': 1, 'td_final': <PrefetchDataset shapes: ({atom_solute: (None, None), bond_solute: (None, None), connectivity_solute: (None, None, 2), mol_features_solute: (None, 5), atom_solvent1: (None, None), bond_solvent1: (None, None), connectivity_solvent1: (None, None, 2), ratio_solvent1: (None, None), mol_features_solvent1: (None, 5), atom_solvent2: (None, None), bond_solvent2: (None, None), connectivity_solvent2: (None, None, 2), ratio_solvent2: (None, None), mol_features_solvent2: (None, 5), atom_solvent3: (None, None), bond_solvent3: (None, None), connectivity_solvent3: (None, None, 2), ratio_solvent3: (None, None), mol_features_solvent3: (None, 5), connectivity_edges: (None, None, 2), weight_edges: (None, None, 4), temp_val: (None, None), num_solvents: (None, None)}, (None,), (None,)), types: ({atom_solute: tf.int32, bond_solute: tf.int32, connectivity_solute: tf.int32, mol_features_solute: tf.float32, atom_solvent1: tf.int32, bond_solvent1: tf.int32, connectivity_solvent1: tf.int32, ratio_solvent1: tf.float32, mol_features_solvent1: tf.float32, atom_solvent2: tf.int32, bond_solvent2: tf.int32, connectivity_solvent2: tf.int32, ratio_solvent2: tf.float32, mol_features_solvent2: tf.float32, atom_solvent3: tf.int32, bond_solvent3: tf.int32, connectivity_solvent3: tf.int32, ratio_solvent3: tf.float32, mol_features_solvent3: tf.float32, connectivity_edges: tf.int32, weight_edges: tf.float32, temp_val: tf.float32, num_solvents: tf.float32}, tf.float32, tf.float32)>, 'num_hidden': 128, 'num_messages': 5, 'learn_rate': 0.0001, 'num_epochs': 1000, 'node_aggreg_op': 'mean', 'do_stoich_multiply': 'before_dense', 'dropout': 1e-10, 'fold_number': 0, 'output_val_col': 'target_constant', 'share_weights': 'solvs', 'split_type': 'shuffle'}

0 commit comments

Comments
 (0)