Example code of run_async use for submitting outstanding inference requests #22126
Unanswered
hylandk-movidius
asked this question in
Performance Q&A
Replies: 1 comment 1 reply
-
Some of your code is showing up as plain text: I am trying to do a similar thing, but only have one active run_async at a time and use dynamic batching. In my case the processes is exiting without explanation. I will share if I get it working. If you have any updates or insights please share/update here. The unit test should be working: In my case I'd replace the |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi folks -
does anyone have any sample code (in python on C) that shows how to use the run_async command successfully on a single ONNX session?
My attempts to use it have not resulted in outstanding requests being sent to the accelerator. The only way I can get more than one outstanding inference request is by creating multiple ONNX sessions and submitting inference requests to each one.
Python script that I am using is below.
Does anyone know how to get this to work on a single ONNX session?
Kevin
import numpy as np
import os
import threading
import time
import onnxruntime as ort
model_path = 'mobilenetv2_035_96.onnx'
Create session options
session_options = ort.SessionOptions()
OUTSTANDING_REQ = 16
REQ = 32000
SECONDS_OFFSET = 86400
print("------------------------------------------------------------------------")
print(" Creating ONNX Inference sessions for each outstanding inference")
sessions = [ort.InferenceSession(model_path, providers=[('OpenVINOExecutionProvider', {'device_type': 'NPU'})], sess_options=session_options) for _ in range(OUTSTANDING_REQ)]
print("------------------------------------------------------------------------")
class run_async_inf:
def init(self, int_id, target):
self.__event = threading.Event()
self.__outputs = None
self.__err = ''
self.__id = int_id
self.__count = 0
self.__target = target
class run_async_inf_callback:
def init(self,int_id):
self.__id = int_id
@staticmethod
def callback(outputs: np.ndarray, state: run_async_inf, err: str) -> None:
state.fill_outputs(outputs, err)
def run_async_inference(label, session_array,run_opts):
print("------------------------------------------------------------------------")
# create an inference request and a callback for each outstatning request.
infer_requests = [run_async_inf(_,REQ/OUTSTANDING_REQ) for _ in range(OUTSTANDING_REQ)]
Get model input information
input_name = sessions[0].get_inputs()[0].name
input_shape = sessions[0].get_inputs()[0].shape
input_type = sessions[0].get_inputs()[0].type
Replace this with real input data matching the model's input shape and type
dummy_input = np.random.randn(*input_shape).astype(np.float32)
Reference - https://onnxruntime.ai/docs/execution-providers/Azure-ExecutionProvider.html
print("------------------------------------------------------------------------")
print("onnxruntime version:", ort.version)
print("providers:", sessions[0].get_providers())
print("------------------------------------------------------------------------")
Create RunOptions
run_options = ort.RunOptions()
run_options.log_verbosity_level = 3
run_async_inference('DEFAULT',sessions,run_options)
exit(0)
Beta Was this translation helpful? Give feedback.
All reactions