-
Notifications
You must be signed in to change notification settings - Fork 88
Description
Hi RF3 team, thanks for the great work on RF3!
I have a question regarding performance and parallelism when evaluating multiple structures in a single run.
In my workflow, I often collect multiple generated structures and pass them together to the inference engine, roughly like this:
get atom array from generated outputs
atom_arrays = [out.atom_array for outs in inputs.values() for out in outs]
rf_results = rf3_engine.run(
inputs=atom_arrays,
out_dir=None,
)I noticed from the documentation that RF3 recommends batch processing to amortize startup costs, which makes a lot of sense. However, I’m not fully clear on how RF3InferenceEngine.run() handles multiple inputs internally.
Specifically, I’d like to understand:
Does RF3InferenceEngine.run() internally batch multiple AtomArray inputs into a single forward pass on the GPU?
Or are they processed sequentially in a Python loop?
Is there any built-in parallelism when multiple AtomArray objects are provided?
e.g. GPU batch parallelism, internal DataLoader workers, or other mechanisms?
Are there recommended settings or configuration options (e.g. batch size, inference batch size, diffusion batch size) to maximize throughput when evaluating many structures at once?
If GPU utilization is low even when passing many AtomArray inputs:
Is the recommended approach to increase batch size?
Or is multi-process / multi-GPU execution expected to be the main way to scale evaluation?
My goal is to efficiently evaluate many generated structures in a single RF3 run without repeatedly paying model initialization and CUDA startup costs.
Any guidance on best practices here would be greatly appreciated. Thanks again!