fix(api): improve error message clarity in evaluation results #3364

wsxzei · 2026-01-07T11:46:49Z

Summary

Fix misleading error messages in the evaluation table by capturing and displaying the actual error messages from LLM providers instead of generic HTTP error codes.

Problem

When LLM providers return errors (e.g., rate limits, quota exceeded), the evaluation table only showed generic messages like "HTTP 429: Too Many Requests". This confused users because they couldn't tell:

If the issue was with Agenta's infrastructure
If it was their API key configuration
If it was their provider's quota limits
What they needed to do to fix it

Solution

Enhanced error handling in invoke_app (api/oss/src/services/llm_apps_service.py) to parse and display the actual provider error messages:

Parse detail.message from the LLM provider's error response as the primary error message
Fallback to detail.error when message is not available
Use generic HTTP error info (status code + message) as last resort
Extract stacktrace from multiple response formats and store both user-friendly message and technical details in PostgreSQL
Improve error message prioritization: provider message > provider error > generic HTTP info

Impact

Before

After

Changes Made

Users now see actionable error messages like "OpenAI rate limit exceeded" instead of "HTTP 429: Too Many Requests"
Technical details (status code, full response, stacktrace) are still available for debugging

Related Issues

Closes #3324

Enhance error handling in invoke_app to display actual provider error messages instead of generic HTTP error codes. This improves user experience by showing actionable error information (e.g., "OpenAI rate limit exceeded" instead of "HTTP 429: Too Many Requests"). Changes: - Parse response detail.message/stacktrace from LLM provider errors - Preserve HTTP status code and full error response for debugging - Add detailed stacktrace extraction from multiple response formats - Improve error message prioritization (provider message > generic error) Previously, evaluation table showed ambiguous error messages that didn't explain what went wrong or how to fix it. Users can now see the actual provider error and hover for technical details. Related issue: Agenta-AI#3324 - [UX bug] Misleading error message in evaluation table

vercel · 2026-01-07T11:46:54Z

Someone is attempting to deploy a commit to the agenta projects Team on Vercel.

A member of the Team first needs to authorize it.

CLAassistant · 2026-01-07T11:46:56Z

All committers have signed the CLA.

wsxzei · 2026-01-09T13:57:01Z

Hi @ashrafchowdury , I've completed the fix for #3324 and verified it with testing. I see the Vercel preview check is currently failing, and as a result I‘m not entirely sure about the correct next steps. Please let me know what specific changes or additions are needed from my side, and I'm happy to update accordingly.

ashrafchowdury · 2026-01-11T20:56:18Z

Thanks for the contribution, @wsxzei. I believe this issue should be addressed on the frontend, as it appears to be an error in how the message is parsed. We just need to ensure that the UI displays the actual message correctly.

wsxzei · 2026-01-12T09:15:39Z

Thanks for the contribution, @wsxzei. I believe this issue should be addressed on the frontend, as it appears to be an error in how the message is parsed. We just need to ensure that the UI displays the actual message correctly.

Thanks for your review, @ashrafchowdury. I appreciate your perspective on this issue. I also initially thought this was a frontend issue, but after tracing the request flow during debugging, I found the problem is actually in the backend error handling. The system has two evaluation modes with different architectures:

Human Evals (frontend handles LLM invocation results)

Frontend's runInvocation calls the Completion service /test endpoint directly (in web/oss/src/services/evaluations/invocations/api.ts)
Parses the response and stores results in evaluation_results table via /preview/evaluations/results API
This mode works correctly since the frontend controls the full error parsing

Auto Evals (backend handles LLM invocation results)

Frontend triggers the evaluation task via /evaluations/preview/start (called by createEvaluation in web/oss/src/services/evaluations/api/index.ts), the task is queued to the worker service
Worker service handles the entire flow: invoking models → parsing responses → storing results in evaluation_results table
Frontend just displays what's stored in the database

The issue occurs in Auto Evals mode. In api/oss/src/services/llm_apps_service.py, the invoke_app function catches HTTP errors but generates generic messages:

except aiohttp.ClientResponseError as e:
    error_message = app_response.get("detail", {}).get(
        "error", f"HTTP error {e.status}: {e.message}"
    )

When LLM providers return quota/rate limit errors, the actual error details are in the response body. The current code doesn't extract this provider-specific message, so users see generic text like "HTTP error 429: too many requests" instead of the real error.
Since the error message is written to evaluation_results table by the backend worker before the frontend sees it, and the raw response isn't persisted, the frontend can't retrieve the original provider error. This needs to be fixed in the backend error handling.

dosubot bot added the size:M This PR changes 30-99 lines, ignoring generated files. label Jan 7, 2026

dosubot bot added Backend dev experience Improvement of the experience using the software. For instance better error messaging labels Jan 7, 2026

Merge branch 'main' into fix/evaluation-table-error-message-ux

77cb350

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(api): improve error message clarity in evaluation results #3364

fix(api): improve error message clarity in evaluation results #3364

wsxzei commented Jan 7, 2026 •

edited

Loading

Uh oh!

vercel bot commented Jan 7, 2026

Uh oh!

CLAassistant commented Jan 7, 2026 •

edited

Loading

Uh oh!

wsxzei commented Jan 9, 2026

Uh oh!

ashrafchowdury commented Jan 11, 2026

Uh oh!

wsxzei commented Jan 12, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fix(api): improve error message clarity in evaluation results #3364

Are you sure you want to change the base?

fix(api): improve error message clarity in evaluation results #3364

Conversation

wsxzei commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Solution

Impact

Before

After

Changes Made

Related Issues

Uh oh!

vercel bot commented Jan 7, 2026

Uh oh!

CLAassistant commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wsxzei commented Jan 9, 2026

Uh oh!

ashrafchowdury commented Jan 11, 2026

Uh oh!

wsxzei commented Jan 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

wsxzei commented Jan 7, 2026 •

edited

Loading

CLAassistant commented Jan 7, 2026 •

edited

Loading

wsxzei commented Jan 12, 2026 •

edited

Loading