Skip to content

Commit fd645c8

Browse files
committed
mention ASE-db and additional params related to cond model
1 parent d390e8e commit fd645c8

File tree

8 files changed

+25
-16
lines changed

8 files changed

+25
-16
lines changed

configs/data/mol_dataset.yaml

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
_target_: MolecularDiffusion.runmodes.train.DataModule
22
root: /home/pregabalin/RF/blue_edm/data/qm9
3-
filename: /home/pregabalin/RF/blue_edm/data/qm9/dsgdb9nsd_ready.csv # 4k or ready
3+
filename: /home/pregabalin/RF/blue_edm/data/qm9/dsgdb9nsd_4k.csv # 4k or ready
44
atom_vocab: [H,B,C,N,O,F,Al,Si,P,S,Cl,As,Se,Br,I,Hg,Bi]
55
dataset_name: qm9
66
with_hydrogen: True
@@ -16,4 +16,5 @@ load_pkl: null
1616
save_pkl: data/test.pkl
1717
data_type: pointcloud # pyg or pointcloud
1818
batch_size: 48
19-
num_workers: 0
19+
num_workers: 0
20+
consider_global_attributes: False

configs/tasks/diffusion.yaml

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -5,10 +5,10 @@ condition_names: []
55
hidden_size: 192
66
act_fn:
77
_target_: torch.nn.SiLU
8-
num_layers: 1
8+
num_layers: 9
99
attention: True
1010
tanh: True
11-
num_sublayers: 12
11+
num_sublayers: 1
1212
sin_embedding: False
1313
aggregation_method: "sum"
1414
dropout: 0.0
@@ -19,7 +19,7 @@ normalization_factor: 1.0
1919
chkpt_path: null
2020

2121
# specific to diffusion
22-
diffusion_steps : 400
22+
diffusion_steps : 900
2323
diffusion_noise_schedule : polynomial_2 # learned, cosine_x, polynomial_x, issnr_x, smld_x
2424
diffusion_noise_precision: 1e-5
2525
diffusion_loss_type: vlb
@@ -40,6 +40,7 @@ sp_regularizer_polynomial_p: 1.1
4040
sp_regularizer_warm_up_steps: 100
4141
reference_indices: null # indices of core atoms for the outpainting objective
4242
# evaluator parameters
43-
metrics: "Validity Relax and connected" # Validity Relax and connected, Validity Strict and connected, Validity Strict, Validity Relax
43+
use_posebuster: True
44+
metrics: valid_posebuster # use_posebuster must be true
4445
n_samples: 24
4546
generative_analysis: True

configs/tasks/diffusion_pretrained.yaml

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -5,10 +5,10 @@ condition_names: []
55
hidden_size: 192
66
act_fn:
77
_target_: torch.nn.SiLU
8-
num_layers: 1
8+
num_layers: 9
99
attention: True
1010
tanh: True
11-
num_sublayers: 9
11+
num_sublayers: 1
1212
sin_embedding: False
1313
aggregation_method: "sum"
1414
dropout: 0.0
@@ -27,8 +27,8 @@ normalize_factors: [1,4,10]
2727
extra_norm_values: []
2828
augment_noise: False
2929
data_augmentation: False
30-
context_mask_rate: 0.0
31-
mask_value: null
30+
context_mask_rate: 0.2
31+
mask_value: 5
3232
normalize_condition: value_10 # [None, "maxmin", "mad"]
3333
sp_regularizer_deploy: False
3434
sp_regularizer_regularizer: hard
@@ -40,6 +40,7 @@ sp_regularizer_polynomial_p: 1.1
4040
sp_regularizer_warm_up_steps: 100
4141
reference_indices: null # indices of core atoms for the outpainting objective
4242
# evaluator parameters
43-
metrics: "Validity Relax and connected" # Validity Relax and connected, Validity Strict and connected, Validity Strict, Validity Relax
43+
use_posebuster: True
44+
metrics: valid_posebuster # use_posebuster must be true
4445
n_samples: 24
4546
generative_analysis: True

src/MolecularDiffusion/runmodes/train/tasks_egcl.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -321,7 +321,7 @@ def build(self):
321321
self.task.std = chk_point["std"]
322322
except FileNotFoundError:
323323
logger.warning(f"Checkpoint not found at {self.chkpt_path}. Initializing model without loading.")
324-
324+
raise FileNotFoundError(f"Checkpoint not found at {self.chkpt_path}.")
325325
self.task.atom_vocab = self.atom_vocab
326326

327327
return self.task

src/MolecularDiffusion/runmodes/train/tasks_egt.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -285,7 +285,7 @@ def build(self):
285285
self.task.std = chk_point["std"]
286286
except FileNotFoundError:
287287
logger.warning(f"Checkpoint not found at {self.chkpt_path}. Initializing model without loading.")
288-
288+
raise FileNotFoundError(f"Checkpoint not found at {self.chkpt_path}.")
289289
self.task.atom_vocab = self.atom_vocab
290290

291291
return self.task

tutorials/01_training_diffusion/README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -44,6 +44,7 @@ This is the most important step. You will override the default parameters to con
4444
| `trainer.output_path` | `trainer: {output_path: "results/my_run"}` | **CRITICAL:** Where all logs and checkpoints are saved. |
4545
| `data.filename` | `data: {filename: "molecules.csv"}` | The CSV file with molecule information. |
4646
| `data.xyz_dir` | `data: {xyz_dir: "xyz_files/"}` | The directory containing `.xyz` geometry files. |
47+
| `data.ase_db_path` | `data: {ase_db_path: "data/qm9.db"}` | Path to an ASE database file (`.db`) or a directory containing `.db` files. This is an alternative to `data.filename` and `data.xyz_dir`. |
4748

4849
#### Data Processing and Caching
4950

tutorials/01_training_diffusion/my_first_run.yaml

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -22,8 +22,9 @@ logger:
2222
data:
2323
batch_size: 64
2424
root: data # where the data is stored
25-
filename: molecule.csv
26-
xyz_dir: xyz_files
25+
# filename: molecule.csv # Use this for CSV/XYZ data
26+
# xyz_dir: xyz_files # Directory for XYZ files if using CSV
27+
ase_db_path: data/qm9.db # Path to an ASE database file or directory containing .db files
2728
load_pkl: true # to reload the processed data if it exists
2829

2930

tutorials/04_finetuning/README.md

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,9 @@ Fine-tuning is a powerful technique where you take a pre-trained model and conti
1414
>
1515
> If the architectures do not match, PyTorch will be unable to load the weights from the checkpoint, and the fine-tuning process will fail. Always ensure your model configuration YAML file matches the settings of the pre-trained model.
1616
17+
If you are adapting from our pre-trained diffusion model, please use the config file `configs/tasks/diffusion_pretrained.yaml` for the tasks
18+
19+
1720
## The Core Concepts of Fine-Tuning
1821

1922
**Important Note:** The configuration files for this tutorial must be placed in the `configs/` directory at the root of the project for the scripts to read the settings.
@@ -91,7 +94,7 @@ tasks:
9194
| `tasks.condition_names`| `["S1_exc", "T1_exc"]` | A list of property names from your dataset that the model should learn to associate with the molecules. |
9295
| `tasks.context_mask_rate`| `0.1` | The probability of hiding the condition during training. A value greater than 0 is required to enable Classifier-Free Guidance (CFG) during generation. A common value is 0.1 (10% of the time). |
9396
| `tasks.mask_value`| `[0, 0]` | The value to use when a condition is masked. This should be a list with the same length as `condition_names`. Typically, this is `0` or the mean value of the property in the dataset. |
94-
97+
| `tasks.normalization_method` | `"maxmin"` | The method to normalize conditional properties. Options are: `"maxmin"` (scales to [-1, 1]), `"mad"` (mean absolute deviation), `"value_N"` (divides by a specific value N), or `null` for no normalization. |
9598
**Example `finetune_add_condition.yaml`:**
9699

97100
```yaml
@@ -117,6 +120,7 @@ tasks:
117120
# KEY CHANGE: Add the conditions to learn
118121
condition_names: ["S1_exc", "T1_exc"]
119122
context_mask_rate: 0.1 # Make it CFG-ready
123+
normalization_method: value_10
120124
```
121125

122126
---

0 commit comments

Comments
 (0)