-
Notifications
You must be signed in to change notification settings - Fork 0
Add QM9 dataset support #55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
src/electrai/dataloader/qm9.py
Outdated
| self.data_path / f"dsgdb9nsd_{mol_id:06d}" / "rho_22.npy", | ||
| self.label_path / f"dsgdb9nsd_{mol_id:06d}" / "rho_22.npy", | ||
| self.data_path / f"dsgdb9nsd_{mol_id:06d}" / "grid_sizes_22.dat", | ||
| self.label_path / f"dsgdb9nsd_{mol_id:06d}" / "grid_sizes_22.dat", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be cleaner to read if you break f"dsgdb9nsd_{mol_id:06d}" into its own variable, e.g. mol_dir, to avoid repeating the same string. Also, noting, could be helpful to make this configurable rather than assuming dsgdb9nsd is always the prefix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ooc why are there both rho_22.npy files and grid_sizes_22.dat files? How do they interplay? Probably warrants an inline comment about them in the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a comment about the array being 1D (flattened), so the original grid size is required for reshaping.
src/electrai/dataloader/qm9.py
Outdated
| nx, ny, nz = rho2.size()[-3:] | ||
| nx = nx // ds1 * ds1 | ||
| ny = ny // ds1 * ds1 | ||
| nz = nz // ds1 * ds1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't these three lines use ds2? Nit: ds_data and ds_label would make this easier to follow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The variables were renamed for clarity. To understand why ds1 is used for label resampling, consider a grid axis of size 33. If the input is downsampled by a factor of 2, the label must be resampled to 32 so that the operation works correctly. This allows mappings such as 16 → 32, whereas 16 → 33 would lead to a mismatch/error.
hanaol
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ready for review
This PR adds support for training the model on the QM9 dataset. Changes include:
Integration of QM9 dataset loading and preprocessing.
Added configuration options for setting optimizer betas.
This enables the model to be trained on QM9 without affecting previous datasets or functionality.