-
Notifications
You must be signed in to change notification settings - Fork 72
Open
Description
Description
In ONNXRuntime, the OpenVINO EP accepts configuration options to set the number of threads and number of streams documented here, but these are ignored when passed to the EP in the Triton model config, for example:
optimization { execution_accelerators {
cpu_execution_accelerator : [ {
name : "openvino"
parameters { key: "num_of_threads" value: "4" }
parameters { key: "num_streams" value: "4" }
} ]
}}
The threading configuration for the ONNXRuntime backend is also ignored (expected)
parameters { key: "intra_op_thread_count" value: { string_value: "4" } }
parameters { key: "inter_op_thread_count" value: { string_value: "2" } }
Triton Information
Last tested with the Triton container 24.05.
To Reproduce
Serving an ONNX model we observe:
- The
intra_op_thread_count/inter_op_thread_countaffect the number of inference threads used when OpenVINO is disabled - Enabling OpenVINO optimizations, CPU usage jumps to the default/max number of CPU threads
- Attempting to set
num_of_threadsornum_streamshas no effect
Expected behavior
Expected behaviour would be that the OpenVINO EP ignores intra_op_thread_count and inter_op_thread_count but obeys num_of_threads and num_streams.
Unless I missed something and the ORT backend with OpenVINO optimizations reads the OpenVINO backend parameters?
Metadata
Metadata
Assignees
Labels
No labels