Skip to content

Conversation

@chunghow-qti
Copy link

Description

This change adds support for the batch multiplier in the QNN HTP backend, allowing the compiled batch size to differ from the runtime batch size, as long as the runtime batch size is divisible by the compiled batch size.

Included in this change:

  • Skip input/output validation in the inference session to accommodate varying batch sizes when the batch multiplier option is used.
  • Verify that disable_cpu_ep_fallback and the HTP backend are used with the batch multiplier option, since batch multiplier is only supported when the entire graph runs on QNN.
  • Ensure that the HTP backend is used inside QnnModel::ExecuteGraph.
  • Modify the batch dimension of qnn_inputs and qnn_outputs when the batch multiplier is triggered.

Motivation and Context

// log evaluation start to trace logging provider
env.GetTelemetryProvider().LogEvaluationStart(session_id_);

#ifdef USE_QNN
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering that we're moving towards plugin EPs, we should avoid EP-specific code in the core onnxruntime library. Otherwise, we would need a special build of onnxruntime.dll that works with the plugin QNN EP.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants