Re-using output of a model's inference as input of the next inference without IOBindings (in C#) for LLMs #18759

claeyzre · 2023-12-08T09:42:05Z

claeyzre
Dec 8, 2023

Hello,

In the C# Documentation there is an interesting comment concerning IOBindings

// This model has input and output of the same shape, so we can easily feed
// output to input using binding, or not using one. The example makes use of
// the binding to demonstrate the circular feeding.
// With the OrtValue API exposed, one create OrtValues over arbitrary buffers and feed them to the model using
// OrtValues based Run APIs. Thus, the Binding is not necessary any longer

Could you provide an exemple of doing such a thing in C# With the OrtValue API exposed, one create OrtValues over arbitrary buffers and feed them to the model using OrtValues based Run APIs. Thus, the Binding is not necessary any longer ?

I am basically trying to reproduce the llama example available here in C#. I use both OrtValue and the IoBindings object but I am seing that the DefaultInstance of the allocation is always the CPU while my model is correctly on GPU. When I try to give the bound output to the input of the model it works but the inference time is slower so I might be doing something wrong.

From the above comment I think I could instantiate directly an OrtValue on the correct device (which is GPU 0 for now) directly without needing IOBindings at all. Could you create or point me to an example of such a thing ?

My end goal would be to use create only one cache per LLM inference, as the "cache" output by an LLM is usually re-used in the next call as a part of the input.

Maybe @yuslepukhin can help since he wrote the comment ?

Thank you very much

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Re-using output of a model's inference as input of the next inference without IOBindings (in C#) for LLMs #18759

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Re-using output of a model's inference as input of the next inference without IOBindings (in C#) for LLMs #18759

Uh oh!

Uh oh!

claeyzre Dec 8, 2023

Replies: 0 comments

claeyzre
Dec 8, 2023