Add QM9 dataset support #55

hanaol · 2026-01-07T20:10:56Z

This PR adds support for training the model on the QM9 dataset. Changes include:

Integration of QM9 dataset loading and preprocessing.
Added configuration options for setting optimizer betas.

This enables the model to be trained on QM9 without affecting previous datasets or functionality.

src/electrai/dataloader/qm9.py

forklady42 · 2026-01-08T20:20:57Z

src/electrai/dataloader/qm9.py

+                self.data_path / f"dsgdb9nsd_{mol_id:06d}" / "rho_22.npy",
+                self.label_path / f"dsgdb9nsd_{mol_id:06d}" / "rho_22.npy",
+                self.data_path / f"dsgdb9nsd_{mol_id:06d}" / "grid_sizes_22.dat",
+                self.label_path / f"dsgdb9nsd_{mol_id:06d}" / "grid_sizes_22.dat",


Would be cleaner to read if you break f"dsgdb9nsd_{mol_id:06d}" into its own variable, e.g. mol_dir, to avoid repeating the same string. Also, noting, could be helpful to make this configurable rather than assuming dsgdb9nsd is always the prefix.

ooc why are there both rho_22.npy files and grid_sizes_22.dat files? How do they interplay? Probably warrants an inline comment about them in the code.

I added a comment about the array being 1D (flattened), so the original grid size is required for reshaping.

forklady42 · 2026-01-08T20:29:35Z

src/electrai/dataloader/qm9.py

+        nx, ny, nz = rho2.size()[-3:]
+        nx = nx // ds1 * ds1
+        ny = ny // ds1 * ds1
+        nz = nz // ds1 * ds1


Shouldn't these three lines use ds2? Nit: ds_data and ds_label would make this easier to follow.

The variables were renamed for clarity. To understand why ds1 is used for label resampling, consider a grid axis of size 33. If the input is downsampled by a factor of 2, the label must be resampled to 32 so that the operation works correctly. This allows mappings such as 16 → 32, whereas 16 → 33 would lead to a mismatch/error.

src/electrai/dataloader/qm9.py

hanaol

Ready for review

Hananeh Oliaei added 5 commits January 7, 2026 14:39

Support heterogeneous batches in loss function and model forward pass

3a78a66

qm9 files

a071bd9

added files to .gitignore

086e49e

Addition of configuration and dataloader files for QM9 training

950ef9b

code consistency

14657d4

hanaol requested a review from forklady42 January 7, 2026 20:11

forklady42 requested changes Jan 8, 2026

View reviewed changes

Hananeh Oliaei added 7 commits January 27, 2026 10:01

updated dataloader to perform unit conversion by default

421c763

converted list to set for efficiency

d04121a

moved the magic number to the top of the file as a variable

0e988a1

handled code duplicatation

246296d

moved conversion factor to the top the file as a variable

1cfeedb

cleaner variable assignment

aac8964

used clearer variable names for input and label data

bc09949

hanaol commented Jan 27, 2026

View reviewed changes

hanaol requested a review from forklady42 January 27, 2026 15:33

Hananeh Oliaei and others added 2 commits January 27, 2026 11:23

changed variable names

94f8714

pre-commit auto-fixes

69a9b03

forklady42 approved these changes Jan 30, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add QM9 dataset support #55

Add QM9 dataset support #55

Uh oh!

hanaol commented Jan 7, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

forklady42 Jan 8, 2026

Uh oh!

forklady42 Jan 8, 2026

Uh oh!

hanaol Jan 27, 2026

Uh oh!

forklady42 Jan 8, 2026

Uh oh!

hanaol Jan 27, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hanaol left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add QM9 dataset support #55

Are you sure you want to change the base?

Add QM9 dataset support #55

Uh oh!

Conversation

hanaol commented Jan 7, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

forklady42 Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

forklady42 Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

hanaol Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

forklady42 Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

hanaol Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hanaol left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants