Example code of run_async use for submitting outstanding inference requests #22126

hylandk-movidius · 2024-09-18T12:40:14Z

hylandk-movidius
Sep 18, 2024

Hi folks -

does anyone have any sample code (in python on C) that shows how to use the run_async command successfully on a single ONNX session?

My attempts to use it have not resulted in outstanding requests being sent to the accelerator. The only way I can get more than one outstanding inference request is by creating multiple ONNX sessions and submitting inference requests to each one.

Python script that I am using is below.

Does anyone know how to get this to work on a single ONNX session?

Kevin

PYTHON SCRIPT>>>>

import numpy as np
import os
import threading
import time

import onnxruntime as ort

model_path = 'mobilenetv2_035_96.onnx'

Create session options

session_options = ort.SessionOptions()
OUTSTANDING_REQ = 16
REQ = 32000
SECONDS_OFFSET = 86400

print("------------------------------------------------------------------------")
print(" Creating ONNX Inference sessions for each outstanding inference")
sessions = [ort.InferenceSession(model_path, providers=[('OpenVINOExecutionProvider', {'device_type': 'NPU'})], sess_options=session_options) for _ in range(OUTSTANDING_REQ)]
print("------------------------------------------------------------------------")

class run_async_inf:
def init(self, int_id, target):
self.__event = threading.Event()
self.__outputs = None
self.__err = ''
self.__id = int_id
self.__count = 0
self.__target = target

def fill_outputs(self, outputs, err):
    self.__outputs = outputs
    self.__err = err
    self.__count = self.__count + 1
    if(self.__count == self.__target):
        #print("target reached.............",int(self.__id))
        self.__event.set()

def get_outputs(self):
    if self.__err != '':
        raise Exception(self.__err)
    return self.__outputs;

def wait(self, sec):
    self.__event.wait(sec)
    self.__event.clear()

class run_async_inf_callback:
def init(self,int_id):
self.__id = int_id
@staticmethod
def callback(outputs: np.ndarray, state: run_async_inf, err: str) -> None:
state.fill_outputs(outputs, err)

def run_async_inference(label, session_array,run_opts):
print("------------------------------------------------------------------------")
# create an inference request and a callback for each outstatning request.
infer_requests = [run_async_inf(_,REQ/OUTSTANDING_REQ) for _ in range(OUTSTANDING_REQ)]

# Take timestamp
start_t_s = time.time() % _SECONDS_OFFSET_
print("> Target Model: " ,model_path)
print("> Total Requests: ",_REQ_)
print("> ONNX Sessions: ",_OUTSTANDING_REQ_)
print("> WORKLOAD_TYPE::",label)

# Run inferences across all sessions in a loop
for x in range(0,int(_REQ_/_OUTSTANDING_REQ_)):
    for idx in range(0,int(_OUTSTANDING_REQ_)):            
        _session_array_[idx].run_async(None, {input_name: dummy_input}, run_async_inf_callback._callback_, infer_requests[idx],run_options=_run_opts_)

# wait for the target counts to be reached on each of the outstanding request classes
for x in range(0,int(_OUTSTANDING_REQ_)):
    infer_requests[x].wait(10)
    
# Take timestamp    
end_t_s = time.time() % _SECONDS_OFFSET_   
duration_in_sec = end_t_s - start_t_s
duration_in_sec = duration_in_sec - duration_in_sec%1
print("> Completed in ", duration_in_sec ," seconds.")
print("------------------------------------------------------------------------")

Get model input information

input_name = sessions[0].get_inputs()[0].name
input_shape = sessions[0].get_inputs()[0].shape
input_type = sessions[0].get_inputs()[0].type

Replace this with real input data matching the model's input shape and type

dummy_input = np.random.randn(*input_shape).astype(np.float32)

Reference - https://onnxruntime.ai/docs/execution-providers/Azure-ExecutionProvider.html

print("------------------------------------------------------------------------")
print("onnxruntime version:", ort.version)
print("providers:", sessions[0].get_providers())
print("------------------------------------------------------------------------")

Create RunOptions

run_options = ort.RunOptions()
run_options.log_verbosity_level = 3

run_async_inference('DEFAULT',sessions,run_options)

exit(0)

ivanthewebber · 2025-03-27T15:12:11Z

ivanthewebber
Mar 27, 2025

Some of your code is showing up as plain text:

I am trying to do a similar thing, but only have one active run_async at a time and use dynamic batching. In my case the processes is exiting without explanation. I will share if I get it working. If you have any updates or insights please share/update here.

The unit test should be working:

onnxruntime/onnxruntime/test/python/onnxruntime_test_python.py

Line 704 in 1f70fc2

def test_run_async(self):

In my case I'd replace the threading.Event with concurrent.futures.Future and then use asyncio.wrap_future to get an awaitable. I can share my dynamic batcher soon; I am just debugging #24200 .

1 reply

ivanthewebber Mar 27, 2025

I did not run into the same problem with concurrent (asyncio) run_async on the same session, but I used futures as I showed in #24200 .

I got my dynamic batching working but it seems to run slightly slower so maybe onnx is already dynamically batching or large batches aren't any more efficient than size-one batches. Based on that I don't think it has much value for anyone.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Example code of run_async use for submitting outstanding inference requests #22126

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Example code of run_async use for submitting outstanding inference requests #22126

Uh oh!

hylandk-movidius Sep 18, 2024

Create session options

Get model input information

Replace this with real input data matching the model's input shape and type

Reference - https://onnxruntime.ai/docs/execution-providers/Azure-ExecutionProvider.html

Create RunOptions

Replies: 1 comment · 1 reply

Uh oh!

Uh oh!

ivanthewebber Mar 27, 2025

Uh oh!

Uh oh!

ivanthewebber Mar 27, 2025

hylandk-movidius
Sep 18, 2024

Replies: 1 comment 1 reply

ivanthewebber
Mar 27, 2025