Skip to content

Seeking guidance on hotspot specification and unexpected results #423

@schoyeon

Description

@schoyeon

Dear RFdiffusion Developers and Community,

First, I would like to express my sincere appreciation for this powerful and innovative tool.

I am a researcher currently using the RFdiffusion -> ProteinMPNN -> AlphaFold2 pipeline to design binders (~80aa) for a target protein (~200aa).

My current approach for selecting hotspots (typically 1-5 residues) is based on:
Residues that appear exposed and hydrophobic (inspected via ChimeraX).
Metrics like Pocketness, SASA, and Shape Complementarity.
Known critical binding residues from literature.

A Puzzling Observation
I am consistently observing a very drastic difference in results between runs with zero hotspots and runs with at least one (1+) hotspot specified.
Contrary to expectations, when one or more hotspots are specified, my success rate (leading to folded, well-predicted binders) is much lower than in the 0-hotspot case.
The overall binding ratio also seems to decrease.
Additionally, AlphaFold2 metrics such as RMSD and i_pAE are substantially worse (higher/less stable) for the hotspot-specified designs, while the runs without specified hotspots tend to produce better metrics.
This significant discrepancy occurs regardless of whether I use the full-length target (contigs = "C/0 100-100") or a cropped region of the target (contigs = "C311-391/0 100-100").

This massive difference makes me wonder if I am fundamentally misunderstanding a core concept of how hotspots are intended to be used.

My Questions

  1. Is this 50x+ performance gap in success rates and metrics (RMSD, i_pAE) between 0 and 1+ hotspots the expected behavior?

  2. If so, I would appreciate it if you could help me understand the methodological reason behind why the difference is so significant.

  3. Based on my criteria, could you please share any best-practice advice for hotspot selection or any other binding configuration tips to help achieve more consistent and successful results?

  4. I am currently testing designs using a cropped region of the target (e.g., contigs = "C311-391/0 100-100"). I am concerned about whether a binder designed against this cropped region might fail to bind properly to the full-length protein. Is this target cropping a common and valid approach? What are the potential risks or considerations I should be aware of?

RFdiffusion Settings
contigs: "C/0 100-100" (full target) or "C311-391/0 100-100" (cropped target)
hotspot: "" (for 0-hotspot runs) or [Please add your hotspot spec here] (for 1+ hotspot runs)
iterations: 50
num_designs: 200
visual: "image"

ProteinMPNN Settings
num_seqs: 8
initial_guess: True
num_recycles: 3
use_multimer: True
rm_aa: "C"
mpnn_sampling_temp: 0.0001

Thank you very much for your time and any insights you can share. Your help is greatly appreciated.

Best regards.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions