Skip to content

[BUG] Performance of PersistentAgentsChatClient.GetStreamingResponseAsync degrades drastically for every additional message (aka run) in agent thread #54326

@shunsaker-macu

Description

@shunsaker-macu

Library name and version

Azure.AI.Agents.Persistent 1.2.0-beta.8

Describe the bug

Each subsequent call of PersistentAgentsChatClient.GetStreamingResponseAsync on the same thread takes 500 - 1000 ms longer than the previous one. This is because of a really inefficient line of code that loops through every previous ThreadRun every time. Just to see if any of them are still running.

await foreach (ThreadRun? run in _client!.Runs.GetRunsAsync(threadId, limit: 1, ListSortOrder.Descending, cancellationToken: cancellationToken).ConfigureAwait(false))

Not only that, it does a separate API call for each one.

Image It explicitly sets `limit` to 1 and order to Descending. Which does not mean:

"Get the single most recent run"

which would make more sense, but instead, because of the foreach, means

"Loop through every single run ever made on this thread, one at a time. Don't batch the API calls or anything"

We can see that _client!.Runs.GetRunsAsync returns AsyncPageable<ThreadRun>, and AsyncPageable clearly states

    /// Enumerate the values in the collection asynchronously.  This may
    /// make multiple service requests.

Expected behavior

I think the expected behavior of this line of code may have originally been

"Get the single most recent run"

and looking at every single run might be unintentional. That's what would make more sense to me. Get most recent run, see if it's still active and needs to be canceled or is a special case of a tool response.

If it is indeed necessary to look at every single run, I would expect it not to be forced to do it one at a time! Remove limit: 1 and let it use the default of 20.

Or, at a minimum, since I have set the ThreadAndRunOptions.TruncationStrategy to be new Truncation(TruncationStrategy.LastMessages) { LastMessages = 10 }, then at least make a special case for me to not have to loop through the entire message history when I'm trying to window the context to just the 10 most recent anyway.

Actual behavior

Really slow. Lots of API calls.

Reproduction Steps

Step 1: Create azure-openai agent in Foundry
Step 2:

var client = new PersistentAgentsClient("your connection string", new DefaultAzureCredential());
PersistentAgentThread thread = client.Threads.CreateThread();
var message = new ChatMessage(ChatRole.User, "Hi!");
for (int i = 0; i < 25; i++)
{
    var startTime = DateTime.UtcNow;
    await foreach (var update in client.AsIChatClient("your agent id").GetStreamingResponseAsync([message], options: new ChatOptions() { ConversationId = thread.Id }))
    {
        //Console.Write(update.Text);
    }
    Console.WriteLine($"Duration: {DateTime.UtcNow - startTime}");
}

Step 3: :(

Environment

  • net8.0
  • Windows 11 Enterprise
  • Visual Studio 18.0.11201.2

Metadata

Metadata

Assignees

No one assigned

    Labels

    customer-reportedIssues that are reported by GitHub users external to the Azure organization.needs-triageWorkflow: This is a new issue that needs to be triaged to the appropriate team.questionThe issue doesn't require a change to the product in order to be resolved. Most issues start as that

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions