Skip to content

CPU inference performance of RFDETRSegNano (~200 ms @312x312) – expected? #641

@NiccoloBalestrieri

Description

@NiccoloBalestrieri

Hi, first of all thanks for releasing RFDETR, great work.

I’m testing RFDETRSegNano for a CPU real-time use case (cause i want compare the result with yolo26) and I wanted to check whether the inference performance I’m seeing is expected, or if I’m missing a recommended optimization path.

Environment

  • Model: RFDETRSegNano (pretrained)
  • OS: Windows 11
  • CPU: - CPU: Intel(R) Core(TM) Ultra 7 155U @ 1.70 GHz
  • PyTorch: 2.8.0
  • ONNX Runtime: 1.23.2
  • Execution provider: CPUExecutionProvider
  • Batch size: 1
  • Input resolution: 312 × 312

Export & inference setup

PyTorch export
model = RFDETRSegNano() model.optimize_for_inference(dtype=torch.float32) model.export()
ONNX inference
`sess = rt.InferenceSession(
"inference_model_optimized.onnx",
providers=["CPUExecutionProvider"]
)

outputs = sess.run(output_names, {input_name: img})`

Observed performance

  • Average inference time: ~200 ms
  • Outputs:
    • Boxes: (1, 100, 4)
    • Class scores: (1, 100, 91)
    • Feature / mask output: (1, 100, 78, 78)

FP16 export was also tested, but ONNX Runtime CPU executes in FP32.

### Question
Is ~200 ms CPU inference at 312×312 the expected performance for RFDETRSegNano?
If not:

  • are there recommended CPU-specific optimizations?
  • is the third output (78×78) required for pure detection?
  • is OpenVINO / INT8 the intended deployment path for CPU?

Any guidance would be appreciated, Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions