Sequence Design in RFpeptides paper #429

hualinhVN · 2025-11-04T15:49:57Z

hualinhVN
Nov 4, 2025

Hello everyone,I'm currently implementing a workflow based on the recent RFpeptides paper (Rettie et al., 2025, Nature Chemical Biology) and had a question about the sequence design step. I'd appreciate any insights from the community.
The Paper's Workflow: The authors describe an iterative, 4-round process for each diffused backbone, which (as I understand it) looks like this:

Run ProteinMPNN on the RFdiffusion backbone (using temperature of 0.0001) to get the single best sequence.
Run Rosetta FastRelax on the new sequence/backbone complex.
Use the relaxed backbone from the previous step as the new input for ProteinMPNN (again at T=0.0001).
Repeat this MPNN-Relax loop for a total of 4 cycles.

My Alternative Workflow Idea: I was considering an alternative, and potentially computationally cheaper, approach to achieve sequence diversity: 1. Take the original, single backbone from RFdiffusion. Generate 4 sequences on this same fixed backbone using LigandMPNN, but use a higher temperature (e.g., T=0.1, 0.2...?) .
2. Take each of these 4 sequences and run Rosetta FastRelax on them once.

My Questions:

What are the perceived pros and cons of my proposed workflow versus the iterative one in the paper?
The authors' method seems like a local "sequence-structure co-optimization," whereas my idea is more of a "fixed-backbone sampling" followed by refinement. Is one inherently superior for this task?
For those who use ProteinMPNN or LigandMPNN for sampling (not just greedy optimization), what temperature values have you found offer a good balance between meaningful diversity and sequence quality (i.e., avoiding sequences that are too random)?

Any thoughts or experiences with these different design strategies would be extremely helpful.

roccomoretti · 2025-11-04T16:34:41Z

roccomoretti
Nov 4, 2025
Maintainer

The authors' method seems like a local "sequence-structure co-optimization," whereas my idea is more of a "fixed-backbone sampling" followed by refinement.

I think this encapsulates the main difference well. There's a bit of trend currently to assume that designs which the ML models do well on are the ones which are more likely to be successful. (And conversely, if the ML program doesn't do well on it, it's likely out-of-distribution for "native-like" systems, and thus is much less likely to be a good design.) As such, to increase the success rate, you want to find designs where the prediction programs have high confidence and have results which are consistent with the input design. -- Hence the iterated convergence approach. You keep feeding the results of the design/repredict pipeline back on itself until you come up with a design where none of the programs being used have issues with it. (They all agree the design will turn out how you expect it to.)

Your suggested approach is one to get a diversity of structures, but you might be missing out on that consistency validation. You're not necessarily vetting that ProteinMPNN thinks that the updated backbone is compatible with the current sequence. -- This could be fine: there's no guarantee that the self-consistent structures are better than ones generated by other methods. (It's an assumption rather than a iron-clad fact.) But it may indicate that you may need to have additional stringent filters/selection on the results. You'll generate a diversity of structures, but are they good structures? Will they fold to what you want them to do and be active how you want?

Also keep in mind that the info that you're feeding into the experiment is just the sequence of the design -- you can't specify the structure outside of specifying the sequence. As such, any relax step which happens after the final sequence design step is only valuable to the extent it helps you select which sequences you take forward to experimental testing.

0 replies

hualinhVN · 2025-11-04T17:29:29Z

hualinhVN
Nov 4, 2025
Author

@roccomoretti Thank you so much! I have one more question: is there any particular reason behind using exactly 4 loops, rather than 3, 5, or any other number?

0 replies

roccomoretti · 2025-11-04T17:57:53Z

roccomoretti
Nov 4, 2025
Maintainer

I don't know why they chose 4 cycles -- my guess is that's what they found to generally work well in practice to give decently convergent results without spending too much computational time.

0 replies

kimlab-cnu · 2025-11-27T01:21:41Z

kimlab-cnu
Nov 27, 2025

Hello! First of all, thank you very much for sharing this repository and the associated work. We are also working on protein design inspired by the same paper, and I have two questions about the current pipeline:

1. ProteinMPNN's sampling temperature
In their ProteinMPNN workflow, the sampling temperature appears to be fixed at 0.0001. As far as I understand, the sampling temperature is a tunable parameter in ProteinMPNN, and the default value in the original implementation is 0.1. Would it be reasonable in your experience to use 0.1 (the default) instead, or do you recommend keeping it at a very low value such as 0.0001 for this pipeline?

2. Adapting ProteinMPNN-Relax pipeline for cyclic peptides
In the fastrelax pipeline (https://github.com/nrbennet/dl_binder_design), the ProteinMPNN–Relax process uses AlphaFold2 for structure prediction/validation. For cyclic peptide binders, however, AF2 often struggles to maintain the cyclic peptide, and the peptide ligand tends to “open up”. Because of this, we would like to use AfCycDesign for cyclic peptide structure prediction instead.

I repeat this ProteinMPNN → Rosetta threading (link to RFDiffusion backbone or Relax structure) → FastRelax process for 4 cycles.

2-1) Conceptually, is this cyclic workflow (ProteinMPNN → threading → Relax, repeated 4 times) consistent with the principles and intent of your original FastRelax-based protocol?
2-2) Do you have any recommendations for improving the runtime of this pipeline?
(For example: Is it helpful to increase the number of CPU cores per job?)

Thank you for your kind responses and for your valuable work on this research. I sincerely appreciate your time and help.

0 replies

rclune · 2025-12-01T21:55:16Z

rclune
Dec 1, 2025
Collaborator

Hello! First of all, thank you very much for sharing this repository and the associated work. We are also working on protein design inspired by the same paper, and I have two questions about the current pipeline:

1. ProteinMPNN's sampling temperature In their ProteinMPNN workflow, the sampling temperature appears to be fixed at 0.0001. As far as I understand, the sampling temperature is a tunable parameter in ProteinMPNN, and the default value in the original implementation is 0.1. Would it be reasonable in your experience to use 0.1 (the default) instead, or do you recommend keeping it at a very low value such as 0.0001 for this pipeline?

2. Adapting ProteinMPNN-Relax pipeline for cyclic peptides In the fastrelax pipeline (https://github.com/nrbennet/dl_binder_design), the ProteinMPNN–Relax process uses AlphaFold2 for structure prediction/validation. For cyclic peptide binders, however, AF2 often struggles to maintain the cyclic peptide, and the peptide ligand tends to “open up”. Because of this, we would like to use AfCycDesign for cyclic peptide structure prediction instead.

I repeat this ProteinMPNN → Rosetta threading (link to RFDiffusion backbone or Relax structure) → FastRelax process for 4 cycles.

2-1) Conceptually, is this cyclic workflow (ProteinMPNN → threading → Relax, repeated 4 times) consistent with the principles and intent of your original FastRelax-based protocol? 2-2) Do you have any recommendations for improving the runtime of this pipeline? (For example: Is it helpful to increase the number of CPU cores per job?)

Thank you for your kind responses and for your valuable work on this research. I sincerely appreciate your time and help.

The original developers of this tool have moved on to other roles/projects and do not regularly check these issues/discussions. I recommend reaching out to the corresponding author(s) of the paper for their recommendation.

I am also going to move this from being an 'Issue' to being a 'Discussion' as I believe the content of these questions match that forum's purpose better.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Sequence Design in RFpeptides paper #429

Uh oh!

{{title}}

Uh oh!

Replies: 5 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Sequence Design in RFpeptides paper #429

Uh oh!

hualinhVN Nov 4, 2025

Replies: 5 comments

Uh oh!

roccomoretti Nov 4, 2025 Maintainer

Uh oh!

hualinhVN Nov 4, 2025 Author

Uh oh!

roccomoretti Nov 4, 2025 Maintainer

Uh oh!

Uh oh!

kimlab-cnu Nov 27, 2025

Uh oh!

Uh oh!

rclune Dec 1, 2025 Collaborator

hualinhVN
Nov 4, 2025

roccomoretti
Nov 4, 2025
Maintainer

hualinhVN
Nov 4, 2025
Author

roccomoretti
Nov 4, 2025
Maintainer

kimlab-cnu
Nov 27, 2025

rclune
Dec 1, 2025
Collaborator