Training REINVENT Prior on Large Cyclic Peptides (≈1200 Da) #307

meshoh52-design · 2025-11-11T13:41:25Z

meshoh52-design
Nov 11, 2025

Hello REINVENT team,

I’m currently trying to train the reinvent.prior on a dataset of a relatively large cyclic peptides (~1200 Da, 80-100 heavy atoms) essentially polymyxin-like macrocycles.
These molecules are fully parsable and valid in RDKit (they pass sanitization and are handled correctly by my external activity scorer), but they are all being rejected by the default input filter during transfer learning with messages like:

"default" filter: CC(C)C[C@@h]1NC(=O)C@@HNC(=O)C@HNC(=O)C@@HCCNC(=O)C@HNC(=O)C@HNC(=O)C@HNC1=O is invalid

I believe this happens because the built-in RDKitFilter has a hard-coded max_heavy_atoms limit, so my macrocyclic peptides fall slightly outside the training domain of the standard prior and get dropped before training starts.

What I’d like to ask Is there a way to turn off the default molecule filter completely or adjust its thresholds and is reinvent a suitable prior for that or should i switch to a different prior ?

Thanks for your time and for maintaining such a great generative framework

Answered by halx

Nov 12, 2025

Hi,

You can switch off the standardization step with

[parameters]

standardize_smiles = false

Cheers,
Hannes.

View full answer

halx · 2025-11-12T06:10:25Z

halx
Nov 12, 2025
Maintainer

Hi,

You can switch off the standardization step with

[parameters]

standardize_smiles = false

Cheers,
Hannes.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training REINVENT Prior on Large Cyclic Peptides (≈1200 Da) #307

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Training REINVENT Prior on Large Cyclic Peptides (≈1200 Da) #307

Uh oh!

meshoh52-design Nov 11, 2025

Replies: 1 comment

Uh oh!

halx Nov 12, 2025 Maintainer

meshoh52-design
Nov 11, 2025

halx
Nov 12, 2025
Maintainer