-
Notifications
You must be signed in to change notification settings - Fork 647
Description
Hi, first of all thanks for releasing RFDETR, great work.
I’m testing RFDETRSegNano for a CPU real-time use case (cause i want compare the result with yolo26) and I wanted to check whether the inference performance I’m seeing is expected, or if I’m missing a recommended optimization path.
Environment
- Model: RFDETRSegNano (pretrained)
- OS: Windows 11
- CPU: - CPU: Intel(R) Core(TM) Ultra 7 155U @ 1.70 GHz
- PyTorch: 2.8.0
- ONNX Runtime: 1.23.2
- Execution provider: CPUExecutionProvider
- Batch size: 1
- Input resolution: 312 × 312
Export & inference setup
PyTorch export
model = RFDETRSegNano() model.optimize_for_inference(dtype=torch.float32) model.export()
ONNX inference
`sess = rt.InferenceSession(
"inference_model_optimized.onnx",
providers=["CPUExecutionProvider"]
)
outputs = sess.run(output_names, {input_name: img})`
Observed performance
- Average inference time: ~200 ms
- Outputs:
- Boxes: (1, 100, 4)
- Class scores: (1, 100, 91)
- Feature / mask output: (1, 100, 78, 78)
FP16 export was also tested, but ONNX Runtime CPU executes in FP32.
### Question
Is ~200 ms CPU inference at 312×312 the expected performance for RFDETRSegNano?
If not:
- are there recommended CPU-specific optimizations?
- is the third output (78×78) required for pure detection?
- is OpenVINO / INT8 the intended deployment path for CPU?
Any guidance would be appreciated, Thanks!