-
Notifications
You must be signed in to change notification settings - Fork 556
Description
Dear RFdiffusion Developers and Community,
First, I would like to express my sincere appreciation for this powerful and innovative tool.
I am a researcher currently using the RFdiffusion -> ProteinMPNN -> AlphaFold2 pipeline to design binders (~80aa) for a target protein (~200aa).
My current approach for selecting hotspots (typically 1-5 residues) is based on:
Residues that appear exposed and hydrophobic (inspected via ChimeraX).
Metrics like Pocketness, SASA, and Shape Complementarity.
Known critical binding residues from literature.
A Puzzling Observation
I am consistently observing a very drastic difference in results between runs with zero hotspots and runs with at least one (1+) hotspot specified.
Contrary to expectations, when one or more hotspots are specified, my success rate (leading to folded, well-predicted binders) is much lower than in the 0-hotspot case.
The overall binding ratio also seems to decrease.
Additionally, AlphaFold2 metrics such as RMSD and i_pAE are substantially worse (higher/less stable) for the hotspot-specified designs, while the runs without specified hotspots tend to produce better metrics.
This significant discrepancy occurs regardless of whether I use the full-length target (contigs = "C/0 100-100") or a cropped region of the target (contigs = "C311-391/0 100-100").
This massive difference makes me wonder if I am fundamentally misunderstanding a core concept of how hotspots are intended to be used.
My Questions
-
Is this 50x+ performance gap in success rates and metrics (RMSD, i_pAE) between 0 and 1+ hotspots the expected behavior?
-
If so, I would appreciate it if you could help me understand the methodological reason behind why the difference is so significant.
-
Based on my criteria, could you please share any best-practice advice for hotspot selection or any other binding configuration tips to help achieve more consistent and successful results?
-
I am currently testing designs using a cropped region of the target (e.g., contigs = "C311-391/0 100-100"). I am concerned about whether a binder designed against this cropped region might fail to bind properly to the full-length protein. Is this target cropping a common and valid approach? What are the potential risks or considerations I should be aware of?
RFdiffusion Settings
contigs: "C/0 100-100" (full target) or "C311-391/0 100-100" (cropped target)
hotspot: "" (for 0-hotspot runs) or [Please add your hotspot spec here] (for 1+ hotspot runs)
iterations: 50
num_designs: 200
visual: "image"
ProteinMPNN Settings
num_seqs: 8
initial_guess: True
num_recycles: 3
use_multimer: True
rm_aa: "C"
mpnn_sampling_temp: 0.0001
Thank you very much for your time and any insights you can share. Your help is greatly appreciated.
Best regards.