Skip to content

Conversation

@hanaol
Copy link
Collaborator

@hanaol hanaol commented Jan 7, 2026

This PR adds support for training the model on the QM9 dataset. Changes include:

  • Integration of QM9 dataset loading and preprocessing.

  • Added configuration options for setting optimizer betas.

This enables the model to be trained on QM9 without affecting previous datasets or functionality.

@hanaol hanaol requested a review from forklady42 January 7, 2026 20:11
self.data_path / f"dsgdb9nsd_{mol_id:06d}" / "rho_22.npy",
self.label_path / f"dsgdb9nsd_{mol_id:06d}" / "rho_22.npy",
self.data_path / f"dsgdb9nsd_{mol_id:06d}" / "grid_sizes_22.dat",
self.label_path / f"dsgdb9nsd_{mol_id:06d}" / "grid_sizes_22.dat",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be cleaner to read if you break f"dsgdb9nsd_{mol_id:06d}" into its own variable, e.g. mol_dir, to avoid repeating the same string. Also, noting, could be helpful to make this configurable rather than assuming dsgdb9nsd is always the prefix.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ooc why are there both rho_22.npy files and grid_sizes_22.dat files? How do they interplay? Probably warrants an inline comment about them in the code.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a comment about the array being 1D (flattened), so the original grid size is required for reshaping.

nx, ny, nz = rho2.size()[-3:]
nx = nx // ds1 * ds1
ny = ny // ds1 * ds1
nz = nz // ds1 * ds1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't these three lines use ds2? Nit: ds_data and ds_label would make this easier to follow.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variables were renamed for clarity. To understand why ds1 is used for label resampling, consider a grid axis of size 33. If the input is downsampled by a factor of 2, the label must be resampled to 32 so that the operation works correctly. This allows mappings such as 16 → 32, whereas 16 → 33 would lead to a mismatch/error.

Copy link
Collaborator Author

@hanaol hanaol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ready for review

@hanaol hanaol requested a review from forklady42 January 27, 2026 15:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants