Skip to content
Discussion options

You must be logged in to vote

Hi,

there is no simple recipe for this, I think. With only 500K compounds you may struggle creating a prior generating a sufficiently high percentage of valid SMILES. This assumes that you would be using the same network size as the ChEMBL prior. We have not tested what the LSTM hyperparameters should be for "small" training sets.

What you could do, is to use the current ChEMBL prior and apply TL with your own dataset.

Your reported memory footprint does not make sense to me. As you are running on a GPU, the main determining factor is GPU memory. You should be able to train a prior of your size with less than 10GB GPU memory. I note that batch_size has significant influence on memory usag…

Replies: 2 comments 12 replies

Comment options

You must be logged in to vote
11 replies
@halx
Comment options

halx Jul 16, 2025
Maintainer

@AyaIsmail284
Comment options

@halx
Comment options

halx Jul 16, 2025
Maintainer

@AyaIsmail284
Comment options

@halx
Comment options

halx Jul 16, 2025
Maintainer

Answer selected by halx
Comment options

You must be logged in to vote
1 reply
@halx
Comment options

halx Oct 13, 2025
Maintainer

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
3 participants