From 7bd050cc54a7177a1299484c8f3f8c34f77e34fa Mon Sep 17 00:00:00 2001 From: Robusta Runner Date: Mon, 6 Oct 2025 17:50:51 +0300 Subject: [PATCH 1/2] WIP --- holmes/plugins/prompts/_general_instructions.jinja2 | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/holmes/plugins/prompts/_general_instructions.jinja2 b/holmes/plugins/prompts/_general_instructions.jinja2 index 55719cf79..bd4453506 100644 --- a/holmes/plugins/prompts/_general_instructions.jinja2 +++ b/holmes/plugins/prompts/_general_instructions.jinja2 @@ -65,6 +65,15 @@ * On the first TodoWrite call, mark at least one task as in_progress, and start working on it in parallel. * When calling TodoWrite for the first time, mark the tasks you started working on with β€˜in_progress’ status. +# Communication during investigation + +* When you naturally pause to communicate (between tool batches), include your key findings so far +* Format: Brief summary of discoveries + what you're checking next +* Example: "Found database connection pool at 95% with slow queries. Checking payment service dependencies..." +* Don't add extra communication points - only enhance existing ones with findings +* Focus on critical discoveries (errors, bottlenecks, root causes) not routine checks +* Keep updates concise but informative - users should understand the investigation progress without waiting for the final summary + # Tool/function calls You are able to make tool calls / function calls. Recognise when a tool has already been called and reuse its result. From 95fcf6c472e0280f6a487fee1ee54ace5d47fb3c Mon Sep 17 00:00:00 2001 From: Robusta Runner Date: Tue, 7 Oct 2025 22:54:35 +0300 Subject: [PATCH 2/2] add eval results with changes to intermediate output --- .../history/results_20251007_172437.md | 308 +++++++++++ .../development/evaluations/latest-results.md | 514 +++++++++--------- 2 files changed, 565 insertions(+), 257 deletions(-) create mode 100644 docs/development/evaluations/history/results_20251007_172437.md diff --git a/docs/development/evaluations/history/results_20251007_172437.md b/docs/development/evaluations/history/results_20251007_172437.md new file mode 100644 index 000000000..11ebb55e8 --- /dev/null +++ b/docs/development/evaluations/history/results_20251007_172437.md @@ -0,0 +1,308 @@ +# HolmesGPT LLM Evaluation Benchmark Results + +**Generated**: 2025-10-07 17:24 UTC +**Total Duration**: 1h 27m 21s +**Iterations**: 1 +**Judge (classifier) model**: gpt-4.1 + +## About this Benchmark + +HolmesGPT is continuously evaluated against real-world Kubernetes and cloud troubleshooting scenarios. + +If you find scenarios that HolmesGPT does not perform well on, please consider adding them as evals to the benchmark. + +## Model Accuracy Comparison + +| Model | Pass | Fail | Skip/Error | Total | Success Rate | +|-------|------|------|------------|-------|--------------| +| gpt-4o | 54 | 40 | 11 | 105 | 🟑 57% (54/94) | +| gpt-4.1 | 70 | 24 | 11 | 105 | 🟑 74% (70/94) | +| gpt-5 | 78 | 15 | 12 | 105 | 🟑 84% (78/93) | +| sonnet-4-20250514 | 80 | 14 | 11 | 105 | 🟑 85% (80/94) | +| sonnet-4-5-20250929 | 89 | 5 | 11 | 105 | 🟑 95% (89/94) | + +## Model Cost Comparison + +| Model | Tests | Avg Cost | Min Cost | Max Cost | Total Cost | +|-------|-------|----------|----------|----------|------------| +| gpt-4o | 93 | $0.15 | $0.03 | $0.57 | $13.61 | +| gpt-4.1 | 93 | $0.14 | $0.03 | $0.82 | $13.29 | +| gpt-5 | 92 | $0.13 | $0.02 | $0.40 | $12.42 | +| sonnet-4-20250514 | 89 | $0.19 | $0.05 | $0.95 | $16.93 | +| sonnet-4-5-20250929 | 93 | $0.17 | $0.05 | $0.67 | $16.25 | + +## Model Latency Comparison + +| Model | Avg (s) | Min (s) | Max (s) | P50 (s) | P95 (s) | +|-------|---------|---------|---------|---------|---------| +| gpt-4o | 44.0 | 8.0 | 640.0 | 36.0 | 73.6 | +| gpt-4.1 | 39.2 | 5.4 | 133.3 | 37.7 | 71.2 | +| gpt-5 | 190.2 | 9.0 | 881.1 | 150.7 | 466.9 | +| sonnet-4-20250514 | 52.2 | 7.8 | 124.7 | 48.4 | 100.3 | +| sonnet-4-5-20250929 | 59.1 | 8.5 | 208.6 | 55.2 | 117.4 | + +⚠️ **Note:** 4 test(s) excluded from latency calculations due to throttling/timeout errors (sonnet-4-20250514: 3, sonnet-4-5-20250929: 1) + +## Performance by Tag + +Success rate by test category and model: + +| Tag | gpt-4o | gpt-4.1 | gpt-5 | sonnet-4-20250514 | sonnet-4-5-20250929 | Warnings | +|-----|-------|-------|-------|-------|-------|----------| +| [chain-of-causation](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22tags%2520includes%2520%255B%2522chain-of-causation%2522%255D%22%2C%20%22label%22%3A%20%22Tags%2520includes%2520chain-of-causation%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | πŸ”΄ 0% (0/6) | πŸ”΄ 0% (0/6) | 🟑 83% (5/6) | 🟑 83% (5/6) | 🟒 100% (6/6) | ⚠️ 10 skipped | +| [context_window](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22tags%2520includes%2520%255B%2522context_window%2522%255D%22%2C%20%22label%22%3A%20%22Tags%2520includes%2520context_window%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | 🟑 29% (2/7) | 🟑 71% (5/7) | 🟑 71% (5/7) | 🟑 71% (5/7) | 🟑 86% (6/7) | | +| [counting](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22tags%2520includes%2520%255B%2522counting%2522%255D%22%2C%20%22label%22%3A%20%22Tags%2520includes%2520counting%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | 🟒 100% (4/4) | 🟒 100% (4/4) | 🟒 100% (4/4) | 🟒 100% (4/4) | 🟒 100% (4/4) | | +| [database](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22tags%2520includes%2520%255B%2522database%2522%255D%22%2C%20%22label%22%3A%20%22Tags%2520includes%2520database%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | πŸ”΄ 0% (0/1) | πŸ”΄ 0% (0/1) | 🟒 100% (1/1) | 🟒 100% (1/1) | 🟒 100% (1/1) | ⚠️ 15 skipped | +| [datadog](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22tags%2520includes%2520%255B%2522datadog%2522%255D%22%2C%20%22label%22%3A%20%22Tags%2520includes%2520datadog%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | 🟑 75% (3/4) | 🟑 75% (3/4) | 🟑 75% (3/4) | 🟑 50% (2/4) | 🟑 75% (3/4) | | +| [datetime](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22tags%2520includes%2520%255B%2522datetime%2522%255D%22%2C%20%22label%22%3A%20%22Tags%2520includes%2520datetime%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | 🟑 50% (2/4) | 🟑 50% (2/4) | 🟒 100% (4/4) | 🟑 50% (2/4) | 🟒 100% (4/4) | ⚠️ 10 skipped | +| [easy](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22tags%2520includes%2520%255B%2522easy%2522%255D%22%2C%20%22label%22%3A%20%22Tags%2520includes%2520easy%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | 🟑 94% (34/36) | 🟑 94% (34/36) | 🟑 89% (32/36) | 🟑 97% (35/36) | 🟒 100% (36/36) | | +| [hard](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22tags%2520includes%2520%255B%2522hard%2522%255D%22%2C%20%22label%22%3A%20%22Tags%2520includes%2520hard%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | 🟑 7% (1/14) | 🟑 21% (3/14) | 🟑 64% (9/14) | 🟑 79% (11/14) | 🟑 86% (12/14) | ⚠️ 30 skipped | +| [kafka](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22tags%2520includes%2520%255B%2522kafka%2522%255D%22%2C%20%22label%22%3A%20%22Tags%2520includes%2520kafka%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | βšͺ️ - | βšͺ️ - | βšͺ️ - | βšͺ️ - | βšͺ️ - | ⚠️ 10 skipped | +| [kubernetes](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22tags%2520includes%2520%255B%2522kubernetes%2522%255D%22%2C%20%22label%22%3A%20%22Tags%2520includes%2520kubernetes%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | 🟑 49% (23/47) | 🟑 74% (35/47) | 🟑 81% (38/47) | 🟑 91% (43/47) | 🟑 94% (44/47) | ⚠️ 5 skipped | +| [logs](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22tags%2520includes%2520%255B%2522logs%2522%255D%22%2C%20%22label%22%3A%20%22Tags%2520includes%2520logs%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | 🟑 54% (14/26) | 🟑 69% (18/26) | 🟑 77% (20/26) | 🟑 69% (18/26) | 🟑 85% (22/26) | ⚠️ 35 skipped | +| [medium](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22tags%2520includes%2520%255B%2522medium%2522%255D%22%2C%20%22label%22%3A%20%22Tags%2520includes%2520medium%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | 🟑 43% (19/44) | 🟑 75% (33/44) | 🟑 86% (37/43) | 🟑 77% (34/44) | 🟑 93% (41/44) | ⚠️ 26 skipped | +| [network](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22tags%2520includes%2520%255B%2522network%2522%255D%22%2C%20%22label%22%3A%20%22Tags%2520includes%2520network%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | 🟑 25% (1/4) | 🟑 50% (2/4) | 🟒 100% (4/4) | 🟑 75% (3/4) | 🟒 100% (4/4) | | +| [no-cicd](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22tags%2520includes%2520%255B%2522no-cicd%2522%255D%22%2C%20%22label%22%3A%20%22Tags%2520includes%2520no-cicd%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | 🟒 100% (1/1) | 🟒 100% (1/1) | 🟒 100% (1/1) | 🟒 100% (1/1) | 🟒 100% (1/1) | | +| [numerical](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22tags%2520includes%2520%255B%2522numerical%2522%255D%22%2C%20%22label%22%3A%20%22Tags%2520includes%2520numerical%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | 🟒 100% (1/1) | 🟒 100% (1/1) | 🟒 100% (1/1) | 🟒 100% (1/1) | 🟒 100% (1/1) | | +| [port-forward](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22tags%2520includes%2520%255B%2522port-forward%2522%255D%22%2C%20%22label%22%3A%20%22Tags%2520includes%2520port-forward%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | 🟑 33% (3/9) | 🟑 56% (5/9) | 🟑 67% (6/9) | 🟑 78% (7/9) | 🟑 67% (6/9) | | +| [prometheus](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22tags%2520includes%2520%255B%2522prometheus%2522%255D%22%2C%20%22label%22%3A%20%22Tags%2520includes%2520prometheus%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | 🟑 50% (2/4) | 🟒 100% (4/4) | 🟑 75% (3/4) | 🟒 100% (4/4) | 🟑 75% (3/4) | | +| [question-answer](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22tags%2520includes%2520%255B%2522question-answer%2522%255D%22%2C%20%22label%22%3A%20%22Tags%2520includes%2520question-answer%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | 🟒 100% (4/4) | 🟒 100% (4/4) | 🟒 100% (4/4) | 🟒 100% (4/4) | 🟒 100% (4/4) | | +| [runbooks](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22tags%2520includes%2520%255B%2522runbooks%2522%255D%22%2C%20%22label%22%3A%20%22Tags%2520includes%2520runbooks%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | 🟑 50% (3/6) | 🟑 67% (4/6) | 🟒 100% (6/6) | 🟑 67% (4/6) | 🟒 100% (6/6) | ⚠️ 5 skipped | +| [slackbot](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22tags%2520includes%2520%255B%2522slackbot%2522%255D%22%2C%20%22label%22%3A%20%22Tags%2520includes%2520slackbot%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | βšͺ️ - | βšͺ️ - | βšͺ️ - | βšͺ️ - | βšͺ️ - | ⚠️ 5 skipped | +| [traces](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22tags%2520includes%2520%255B%2522traces%2522%255D%22%2C%20%22label%22%3A%20%22Tags%2520includes%2520traces%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | πŸ”΄ 0% (0/5) | πŸ”΄ 0% (0/5) | 🟑 80% (4/5) | 🟑 80% (4/5) | 🟒 100% (5/5) | | +| [transparency](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22tags%2520includes%2520%255B%2522transparency%2522%255D%22%2C%20%22label%22%3A%20%22Tags%2520includes%2520transparency%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | 🟑 71% (10/14) | 🟑 79% (11/14) | 🟑 93% (13/14) | 🟑 71% (10/14) | 🟒 100% (14/14) | ⚠️ 5 skipped | +| **Overall** | 🟑 57% (54/94) | 🟑 74% (70/94) | 🟑 84% (78/93) | 🟑 85% (80/94) | 🟑 95% (89/94) | ⚠️ 56 skipped | + +## Raw Results + +Status of all evaluations across models. Color coding: + +- 🟒 Passing 100% (stable) +- 🟑 Passing 1-99% +- πŸ”΄ Passing 0% (failing) +- πŸ”§ Mock data failure (missing or invalid test data) +- ⚠️ Setup failure (environment/infrastructure issue) +- ⏱️ Timeout or rate limit error +- ⏭️ Test skipped (e.g., known issue or precondition not met) + +| Eval ID | [gpt-4o](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.model%2520%253D%2520%2522gpt-4o%2522%22%2C%20%22label%22%3A%20%22metadata.model%2520equals%2520gpt-4o%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [gpt-4.1](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.model%2520%253D%2520%2522gpt-4.1%2522%22%2C%20%22label%22%3A%20%22metadata.model%2520equals%2520gpt-4.1%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [gpt-5](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.model%2520%253D%2520%2522gpt-5%2522%22%2C%20%22label%22%3A%20%22metadata.model%2520equals%2520gpt-5%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [sonnet-4-20250514](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.model%2520%253D%2520%2522anthropic%252Fclaude-sonnet-4-20250514%2522%22%2C%20%22label%22%3A%20%22metadata.model%2520equals%2520anthropic%252Fclaude-sonnet-4-20250514%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [sonnet-4-5-20250929](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.model%2520%253D%2520%2522anthropic%252Fclaude-sonnet-4-5-20250929%2522%22%2C%20%22label%22%3A%20%22metadata.model%2520equals%2520anthropic%252Fclaude-sonnet-4-5-20250929%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +|---------|-------|-------|-------|-------|-------| +| [**01_how_many_pods**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/01_how_many_pods/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252201_how_many_pods%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252001_how_many_pods%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252201_how_many_pods%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252001_how_many_pods%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252201_how_many_pods%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252001_how_many_pods%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252201_how_many_pods%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252001_how_many_pods%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252201_how_many_pods%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252001_how_many_pods%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252201_how_many_pods%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252001_how_many_pods%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**02_what_is_wrong_with_pod**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/02_what_is_wrong_with_pod/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252202_what_is_wrong_with_pod%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252002_what_is_wrong_with_pod%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252202_what_is_wrong_with_pod%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252002_what_is_wrong_with_pod%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252202_what_is_wrong_with_pod%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252002_what_is_wrong_with_pod%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252202_what_is_wrong_with_pod%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252002_what_is_wrong_with_pod%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252202_what_is_wrong_with_pod%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252002_what_is_wrong_with_pod%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252202_what_is_wrong_with_pod%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252002_what_is_wrong_with_pod%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**03_what_is_the_command_to_port_forward**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/03_what_is_the_command_to_port_forward/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252203_what_is_the_command_to_port_forward%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252003_what_is_the_command_to_port_forward%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252203_what_is_the_command_to_port_forward%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252003_what_is_the_command_to_port_forward%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252203_what_is_the_command_to_port_forward%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252003_what_is_the_command_to_port_forward%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252203_what_is_the_command_to_port_forward%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252003_what_is_the_command_to_port_forward%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252203_what_is_the_command_to_port_forward%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252003_what_is_the_command_to_port_forward%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252203_what_is_the_command_to_port_forward%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252003_what_is_the_command_to_port_forward%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**04_related_k8s_events**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/04_related_k8s_events/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252204_related_k8s_events%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252004_related_k8s_events%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252204_related_k8s_events%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252004_related_k8s_events%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252204_related_k8s_events%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252004_related_k8s_events%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252204_related_k8s_events%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252004_related_k8s_events%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252204_related_k8s_events%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252004_related_k8s_events%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252204_related_k8s_events%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252004_related_k8s_events%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**05_image_version**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/05_image_version/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252205_image_version%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252005_image_version%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252205_image_version%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252005_image_version%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252205_image_version%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252005_image_version%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252205_image_version%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252005_image_version%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252205_image_version%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252005_image_version%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252205_image_version%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252005_image_version%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**08_sock_shop_frontend**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/08_sock_shop_frontend/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252208_sock_shop_frontend%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252008_sock_shop_frontend%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [⏭️](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252208_sock_shop_frontend%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252008_sock_shop_frontend%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [⏭️](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252208_sock_shop_frontend%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252008_sock_shop_frontend%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [⏭️](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252208_sock_shop_frontend%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252008_sock_shop_frontend%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [⏭️](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252208_sock_shop_frontend%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252008_sock_shop_frontend%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [⏭️](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252208_sock_shop_frontend%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252008_sock_shop_frontend%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**09_crashpod**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/09_crashpod/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252209_crashpod%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252009_crashpod%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252209_crashpod%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252009_crashpod%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252209_crashpod%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252009_crashpod%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252209_crashpod%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252009_crashpod%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252209_crashpod%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252009_crashpod%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252209_crashpod%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252009_crashpod%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**100a_historical_logs**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/100a_historical_logs/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522100a_historical_logs%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520100a_historical_logs%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522100a_historical_logs%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520100a_historical_logs%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522100a_historical_logs%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520100a_historical_logs%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522100a_historical_logs%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520100a_historical_logs%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522100a_historical_logs%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520100a_historical_logs%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522100a_historical_logs%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520100a_historical_logs%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**100b_historical_logs_nonstandard_label**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/100b_historical_logs_nonstandard_label/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522100b_historical_logs_nonstandard_label%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520100b_historical_logs_nonstandard_label%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522100b_historical_logs_nonstandard_label%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520100b_historical_logs_nonstandard_label%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522100b_historical_logs_nonstandard_label%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520100b_historical_logs_nonstandard_label%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522100b_historical_logs_nonstandard_label%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520100b_historical_logs_nonstandard_label%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522100b_historical_logs_nonstandard_label%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520100b_historical_logs_nonstandard_label%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522100b_historical_logs_nonstandard_label%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520100b_historical_logs_nonstandard_label%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**101_historical_logs_pod_deleted**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/101_historical_logs_pod_deleted/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522101_historical_logs_pod_deleted%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520101_historical_logs_pod_deleted%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522101_historical_logs_pod_deleted%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520101_historical_logs_pod_deleted%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522101_historical_logs_pod_deleted%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520101_historical_logs_pod_deleted%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522101_historical_logs_pod_deleted%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520101_historical_logs_pod_deleted%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522101_historical_logs_pod_deleted%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520101_historical_logs_pod_deleted%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522101_historical_logs_pod_deleted%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520101_historical_logs_pod_deleted%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**103_logs_transparency_default_limit**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/103_logs_transparency_default_limit/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522103_logs_transparency_default_limit%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520103_logs_transparency_default_limit%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522103_logs_transparency_default_limit%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520103_logs_transparency_default_limit%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522103_logs_transparency_default_limit%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520103_logs_transparency_default_limit%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522103_logs_transparency_default_limit%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520103_logs_transparency_default_limit%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522103_logs_transparency_default_limit%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520103_logs_transparency_default_limit%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522103_logs_transparency_default_limit%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520103_logs_transparency_default_limit%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**104a_postgres_root_issue**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/104a_postgres_root_issue/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522104a_postgres_root_issue%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520104a_postgres_root_issue%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522104a_postgres_root_issue%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520104a_postgres_root_issue%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522104a_postgres_root_issue%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520104a_postgres_root_issue%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522104a_postgres_root_issue%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520104a_postgres_root_issue%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522104a_postgres_root_issue%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520104a_postgres_root_issue%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522104a_postgres_root_issue%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520104a_postgres_root_issue%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**104b_postgres_missing_index_pgstat**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/104b_postgres_missing_index_pgstat/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522104b_postgres_missing_index_pgstat%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520104b_postgres_missing_index_pgstat%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [⏭️](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522104b_postgres_missing_index_pgstat%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520104b_postgres_missing_index_pgstat%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [⏭️](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522104b_postgres_missing_index_pgstat%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520104b_postgres_missing_index_pgstat%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [⏭️](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522104b_postgres_missing_index_pgstat%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520104b_postgres_missing_index_pgstat%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [⏭️](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522104b_postgres_missing_index_pgstat%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520104b_postgres_missing_index_pgstat%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [⏭️](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522104b_postgres_missing_index_pgstat%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520104b_postgres_missing_index_pgstat%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**104c_postgres_minimal_missing_index**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/104c_postgres_minimal_missing_index/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522104c_postgres_minimal_missing_index%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520104c_postgres_minimal_missing_index%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [⏭️](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522104c_postgres_minimal_missing_index%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520104c_postgres_minimal_missing_index%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [⏭️](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522104c_postgres_minimal_missing_index%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520104c_postgres_minimal_missing_index%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [⏭️](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522104c_postgres_minimal_missing_index%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520104c_postgres_minimal_missing_index%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [⏭️](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522104c_postgres_minimal_missing_index%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520104c_postgres_minimal_missing_index%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [⏭️](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522104c_postgres_minimal_missing_index%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520104c_postgres_minimal_missing_index%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**105_redis_wrong_data_structure**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/105_redis_wrong_data_structure/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522105_redis_wrong_data_structure%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520105_redis_wrong_data_structure%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [⏭️](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522105_redis_wrong_data_structure%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520105_redis_wrong_data_structure%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [⏭️](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522105_redis_wrong_data_structure%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520105_redis_wrong_data_structure%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [⏭️](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522105_redis_wrong_data_structure%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520105_redis_wrong_data_structure%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [⏭️](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522105_redis_wrong_data_structure%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520105_redis_wrong_data_structure%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [⏭️](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522105_redis_wrong_data_structure%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520105_redis_wrong_data_structure%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**107_log_filter_http_status_code**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/107_log_filter_http_status_code/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522107_log_filter_http_status_code%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520107_log_filter_http_status_code%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522107_log_filter_http_status_code%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520107_log_filter_http_status_code%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522107_log_filter_http_status_code%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520107_log_filter_http_status_code%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522107_log_filter_http_status_code%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520107_log_filter_http_status_code%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522107_log_filter_http_status_code%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520107_log_filter_http_status_code%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522107_log_filter_http_status_code%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520107_log_filter_http_status_code%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**108_logs_nearby_lines**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/108_logs_nearby_lines/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522108_logs_nearby_lines%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520108_logs_nearby_lines%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522108_logs_nearby_lines%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520108_logs_nearby_lines%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522108_logs_nearby_lines%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520108_logs_nearby_lines%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522108_logs_nearby_lines%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520108_logs_nearby_lines%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522108_logs_nearby_lines%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520108_logs_nearby_lines%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522108_logs_nearby_lines%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520108_logs_nearby_lines%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**109_logs_transparency_not_found**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/109_logs_transparency_not_found/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522109_logs_transparency_not_found%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520109_logs_transparency_not_found%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522109_logs_transparency_not_found%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520109_logs_transparency_not_found%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522109_logs_transparency_not_found%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520109_logs_transparency_not_found%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522109_logs_transparency_not_found%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520109_logs_transparency_not_found%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522109_logs_transparency_not_found%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520109_logs_transparency_not_found%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522109_logs_transparency_not_found%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520109_logs_transparency_not_found%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**10_image_pull_backoff**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/10_image_pull_backoff/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252210_image_pull_backoff%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252010_image_pull_backoff%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252210_image_pull_backoff%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252010_image_pull_backoff%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252210_image_pull_backoff%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252010_image_pull_backoff%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252210_image_pull_backoff%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252010_image_pull_backoff%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252210_image_pull_backoff%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252010_image_pull_backoff%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252210_image_pull_backoff%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252010_image_pull_backoff%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**110_k8s_events_image_pull**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/110_k8s_events_image_pull/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522110_k8s_events_image_pull%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520110_k8s_events_image_pull%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522110_k8s_events_image_pull%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520110_k8s_events_image_pull%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522110_k8s_events_image_pull%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520110_k8s_events_image_pull%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522110_k8s_events_image_pull%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520110_k8s_events_image_pull%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522110_k8s_events_image_pull%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520110_k8s_events_image_pull%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522110_k8s_events_image_pull%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520110_k8s_events_image_pull%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**111_disabled_datadog_traces**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/111_disabled_datadog_traces/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522111_disabled_datadog_traces%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520111_disabled_datadog_traces%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522111_disabled_datadog_traces%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520111_disabled_datadog_traces%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522111_disabled_datadog_traces%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520111_disabled_datadog_traces%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522111_disabled_datadog_traces%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520111_disabled_datadog_traces%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522111_disabled_datadog_traces%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520111_disabled_datadog_traces%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522111_disabled_datadog_traces%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520111_disabled_datadog_traces%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**111_pod_names_contain_service**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/111_pod_names_contain_service/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522111_pod_names_contain_service%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520111_pod_names_contain_service%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522111_pod_names_contain_service%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520111_pod_names_contain_service%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522111_pod_names_contain_service%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520111_pod_names_contain_service%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522111_pod_names_contain_service%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520111_pod_names_contain_service%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522111_pod_names_contain_service%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520111_pod_names_contain_service%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522111_pod_names_contain_service%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520111_pod_names_contain_service%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**112_find_pvcs_by_uuid**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/112_find_pvcs_by_uuid/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522112_find_pvcs_by_uuid%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520112_find_pvcs_by_uuid%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522112_find_pvcs_by_uuid%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520112_find_pvcs_by_uuid%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522112_find_pvcs_by_uuid%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520112_find_pvcs_by_uuid%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522112_find_pvcs_by_uuid%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520112_find_pvcs_by_uuid%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522112_find_pvcs_by_uuid%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520112_find_pvcs_by_uuid%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522112_find_pvcs_by_uuid%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520112_find_pvcs_by_uuid%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**114_checkout_latency_tracing_rebuild[0]**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/114_checkout_latency_tracing_rebuild[0]/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522114_checkout_latency_tracing_rebuild%255B0%255D%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520114_checkout_latency_tracing_rebuild%255B0%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522114_checkout_latency_tracing_rebuild%255B0%255D%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520114_checkout_latency_tracing_rebuild%255B0%255D%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522114_checkout_latency_tracing_rebuild%255B0%255D%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520114_checkout_latency_tracing_rebuild%255B0%255D%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522114_checkout_latency_tracing_rebuild%255B0%255D%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520114_checkout_latency_tracing_rebuild%255B0%255D%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522114_checkout_latency_tracing_rebuild%255B0%255D%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520114_checkout_latency_tracing_rebuild%255B0%255D%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522114_checkout_latency_tracing_rebuild%255B0%255D%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520114_checkout_latency_tracing_rebuild%255B0%255D%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**115_checkout_errors_tracing[0]**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/115_checkout_errors_tracing[0]/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522115_checkout_errors_tracing%255B0%255D%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520115_checkout_errors_tracing%255B0%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522115_checkout_errors_tracing%255B0%255D%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520115_checkout_errors_tracing%255B0%255D%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522115_checkout_errors_tracing%255B0%255D%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520115_checkout_errors_tracing%255B0%255D%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522115_checkout_errors_tracing%255B0%255D%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520115_checkout_errors_tracing%255B0%255D%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522115_checkout_errors_tracing%255B0%255D%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520115_checkout_errors_tracing%255B0%255D%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522115_checkout_errors_tracing%255B0%255D%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520115_checkout_errors_tracing%255B0%255D%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**11_init_containers**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/11_init_containers/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252211_init_containers%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252011_init_containers%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252211_init_containers%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252011_init_containers%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252211_init_containers%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252011_init_containers%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252211_init_containers%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252011_init_containers%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252211_init_containers%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252011_init_containers%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252211_init_containers%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252011_init_containers%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**121_new_relic_checkout_errors_tracing[0]**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/121_new_relic_checkout_errors_tracing[0]/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522121_new_relic_checkout_errors_tracing%255B0%255D%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520121_new_relic_checkout_errors_tracing%255B0%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522121_new_relic_checkout_errors_tracing%255B0%255D%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520121_new_relic_checkout_errors_tracing%255B0%255D%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522121_new_relic_checkout_errors_tracing%255B0%255D%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520121_new_relic_checkout_errors_tracing%255B0%255D%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522121_new_relic_checkout_errors_tracing%255B0%255D%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520121_new_relic_checkout_errors_tracing%255B0%255D%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522121_new_relic_checkout_errors_tracing%255B0%255D%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520121_new_relic_checkout_errors_tracing%255B0%255D%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522121_new_relic_checkout_errors_tracing%255B0%255D%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520121_new_relic_checkout_errors_tracing%255B0%255D%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**122_new_relic_checkout_latency_tracing_rebuild[0]**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/122_new_relic_checkout_latency_tracing_rebuild[0]/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522122_new_relic_checkout_latency_tracing_rebuild%255B0%255D%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520122_new_relic_checkout_latency_tracing_rebuild%255B0%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522122_new_relic_checkout_latency_tracing_rebuild%255B0%255D%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520122_new_relic_checkout_latency_tracing_rebuild%255B0%255D%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522122_new_relic_checkout_latency_tracing_rebuild%255B0%255D%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520122_new_relic_checkout_latency_tracing_rebuild%255B0%255D%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522122_new_relic_checkout_latency_tracing_rebuild%255B0%255D%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520122_new_relic_checkout_latency_tracing_rebuild%255B0%255D%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522122_new_relic_checkout_latency_tracing_rebuild%255B0%255D%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520122_new_relic_checkout_latency_tracing_rebuild%255B0%255D%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522122_new_relic_checkout_latency_tracing_rebuild%255B0%255D%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520122_new_relic_checkout_latency_tracing_rebuild%255B0%255D%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**123_new_relic_checkout_errors_tracing[0]**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/123_new_relic_checkout_errors_tracing[0]/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522123_new_relic_checkout_errors_tracing%255B0%255D%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520123_new_relic_checkout_errors_tracing%255B0%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522123_new_relic_checkout_errors_tracing%255B0%255D%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520123_new_relic_checkout_errors_tracing%255B0%255D%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522123_new_relic_checkout_errors_tracing%255B0%255D%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520123_new_relic_checkout_errors_tracing%255B0%255D%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522123_new_relic_checkout_errors_tracing%255B0%255D%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520123_new_relic_checkout_errors_tracing%255B0%255D%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [⏱️](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522123_new_relic_checkout_errors_tracing%255B0%255D%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520123_new_relic_checkout_errors_tracing%255B0%255D%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522123_new_relic_checkout_errors_tracing%255B0%255D%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520123_new_relic_checkout_errors_tracing%255B0%255D%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**12_job_crashing**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/12_job_crashing/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252212_job_crashing%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252012_job_crashing%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252212_job_crashing%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252012_job_crashing%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252212_job_crashing%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252012_job_crashing%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252212_job_crashing%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252012_job_crashing%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252212_job_crashing%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252012_job_crashing%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252212_job_crashing%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252012_job_crashing%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**13a_pending_node_selector_basic**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/13a_pending_node_selector_basic/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252213a_pending_node_selector_basic%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252013a_pending_node_selector_basic%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252213a_pending_node_selector_basic%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252013a_pending_node_selector_basic%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252213a_pending_node_selector_basic%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252013a_pending_node_selector_basic%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252213a_pending_node_selector_basic%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252013a_pending_node_selector_basic%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252213a_pending_node_selector_basic%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252013a_pending_node_selector_basic%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252213a_pending_node_selector_basic%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252013a_pending_node_selector_basic%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**13b_pending_node_selector_detailed**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/13b_pending_node_selector_detailed/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252213b_pending_node_selector_detailed%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252013b_pending_node_selector_detailed%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252213b_pending_node_selector_detailed%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252013b_pending_node_selector_detailed%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252213b_pending_node_selector_detailed%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252013b_pending_node_selector_detailed%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252213b_pending_node_selector_detailed%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252013b_pending_node_selector_detailed%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252213b_pending_node_selector_detailed%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252013b_pending_node_selector_detailed%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252213b_pending_node_selector_detailed%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252013b_pending_node_selector_detailed%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**14_pending_resources**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/14_pending_resources/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252214_pending_resources%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252014_pending_resources%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252214_pending_resources%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252014_pending_resources%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252214_pending_resources%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252014_pending_resources%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252214_pending_resources%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252014_pending_resources%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252214_pending_resources%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252014_pending_resources%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252214_pending_resources%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252014_pending_resources%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**156_kafka_opensearch_latency**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/156_kafka_opensearch_latency/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522156_kafka_opensearch_latency%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520156_kafka_opensearch_latency%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [⏭️](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522156_kafka_opensearch_latency%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520156_kafka_opensearch_latency%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [⏭️](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522156_kafka_opensearch_latency%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520156_kafka_opensearch_latency%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [⏭️](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522156_kafka_opensearch_latency%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520156_kafka_opensearch_latency%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [⏭️](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522156_kafka_opensearch_latency%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520156_kafka_opensearch_latency%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [⏭️](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522156_kafka_opensearch_latency%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520156_kafka_opensearch_latency%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**159_prometheus_high_cardinality_cpu[0]**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/159_prometheus_high_cardinality_cpu[0]/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522159_prometheus_high_cardinality_cpu%255B0%255D%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520159_prometheus_high_cardinality_cpu%255B0%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522159_prometheus_high_cardinality_cpu%255B0%255D%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520159_prometheus_high_cardinality_cpu%255B0%255D%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522159_prometheus_high_cardinality_cpu%255B0%255D%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520159_prometheus_high_cardinality_cpu%255B0%255D%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522159_prometheus_high_cardinality_cpu%255B0%255D%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520159_prometheus_high_cardinality_cpu%255B0%255D%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522159_prometheus_high_cardinality_cpu%255B0%255D%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520159_prometheus_high_cardinality_cpu%255B0%255D%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522159_prometheus_high_cardinality_cpu%255B0%255D%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520159_prometheus_high_cardinality_cpu%255B0%255D%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**159_prometheus_high_cardinality_cpu[1]**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/159_prometheus_high_cardinality_cpu[1]/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522159_prometheus_high_cardinality_cpu%255B1%255D%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520159_prometheus_high_cardinality_cpu%255B1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522159_prometheus_high_cardinality_cpu%255B1%255D%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520159_prometheus_high_cardinality_cpu%255B1%255D%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522159_prometheus_high_cardinality_cpu%255B1%255D%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520159_prometheus_high_cardinality_cpu%255B1%255D%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522159_prometheus_high_cardinality_cpu%255B1%255D%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520159_prometheus_high_cardinality_cpu%255B1%255D%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522159_prometheus_high_cardinality_cpu%255B1%255D%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520159_prometheus_high_cardinality_cpu%255B1%255D%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522159_prometheus_high_cardinality_cpu%255B1%255D%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520159_prometheus_high_cardinality_cpu%255B1%255D%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**159_prometheus_high_cardinality_cpu[2]**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/159_prometheus_high_cardinality_cpu[2]/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522159_prometheus_high_cardinality_cpu%255B2%255D%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520159_prometheus_high_cardinality_cpu%255B2%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522159_prometheus_high_cardinality_cpu%255B2%255D%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520159_prometheus_high_cardinality_cpu%255B2%255D%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522159_prometheus_high_cardinality_cpu%255B2%255D%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520159_prometheus_high_cardinality_cpu%255B2%255D%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522159_prometheus_high_cardinality_cpu%255B2%255D%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520159_prometheus_high_cardinality_cpu%255B2%255D%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522159_prometheus_high_cardinality_cpu%255B2%255D%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520159_prometheus_high_cardinality_cpu%255B2%255D%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522159_prometheus_high_cardinality_cpu%255B2%255D%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520159_prometheus_high_cardinality_cpu%255B2%255D%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**15_failed_readiness_probe**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/15_failed_readiness_probe/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252215_failed_readiness_probe%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252015_failed_readiness_probe%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252215_failed_readiness_probe%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252015_failed_readiness_probe%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252215_failed_readiness_probe%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252015_failed_readiness_probe%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252215_failed_readiness_probe%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252015_failed_readiness_probe%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252215_failed_readiness_probe%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252015_failed_readiness_probe%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252215_failed_readiness_probe%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252015_failed_readiness_probe%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**16_failed_no_toolset_found**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/16_failed_no_toolset_found/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252216_failed_no_toolset_found%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252016_failed_no_toolset_found%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252216_failed_no_toolset_found%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252016_failed_no_toolset_found%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252216_failed_no_toolset_found%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252016_failed_no_toolset_found%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252216_failed_no_toolset_found%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252016_failed_no_toolset_found%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252216_failed_no_toolset_found%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252016_failed_no_toolset_found%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252216_failed_no_toolset_found%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252016_failed_no_toolset_found%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**17_oom_kill**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/17_oom_kill/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252217_oom_kill%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252017_oom_kill%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252217_oom_kill%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252017_oom_kill%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252217_oom_kill%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252017_oom_kill%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252217_oom_kill%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252017_oom_kill%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252217_oom_kill%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252017_oom_kill%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252217_oom_kill%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252017_oom_kill%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**19_detect_missing_app_details**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/19_detect_missing_app_details/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252219_detect_missing_app_details%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252019_detect_missing_app_details%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252219_detect_missing_app_details%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252019_detect_missing_app_details%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252219_detect_missing_app_details%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252019_detect_missing_app_details%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252219_detect_missing_app_details%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252019_detect_missing_app_details%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252219_detect_missing_app_details%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252019_detect_missing_app_details%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252219_detect_missing_app_details%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252019_detect_missing_app_details%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**20_long_log_file_search**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/20_long_log_file_search/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252220_long_log_file_search%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252020_long_log_file_search%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252220_long_log_file_search%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252020_long_log_file_search%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252220_long_log_file_search%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252020_long_log_file_search%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252220_long_log_file_search%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252020_long_log_file_search%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252220_long_log_file_search%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252020_long_log_file_search%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252220_long_log_file_search%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252020_long_log_file_search%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**21_job_fail_curl_no_svc_account**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/21_job_fail_curl_no_svc_account/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252221_job_fail_curl_no_svc_account%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252021_job_fail_curl_no_svc_account%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252221_job_fail_curl_no_svc_account%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252021_job_fail_curl_no_svc_account%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252221_job_fail_curl_no_svc_account%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252021_job_fail_curl_no_svc_account%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252221_job_fail_curl_no_svc_account%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252021_job_fail_curl_no_svc_account%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252221_job_fail_curl_no_svc_account%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252021_job_fail_curl_no_svc_account%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252221_job_fail_curl_no_svc_account%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252021_job_fail_curl_no_svc_account%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**22_high_latency_dbi_down**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/22_high_latency_dbi_down/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252222_high_latency_dbi_down%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252022_high_latency_dbi_down%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [⚠️](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252222_high_latency_dbi_down%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252022_high_latency_dbi_down%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [⚠️](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252222_high_latency_dbi_down%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252022_high_latency_dbi_down%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [⚠️](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252222_high_latency_dbi_down%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252022_high_latency_dbi_down%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [⚠️](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252222_high_latency_dbi_down%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252022_high_latency_dbi_down%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [⚠️](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252222_high_latency_dbi_down%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252022_high_latency_dbi_down%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**23_app_error_in_current_logs**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/23_app_error_in_current_logs/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252223_app_error_in_current_logs%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252023_app_error_in_current_logs%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252223_app_error_in_current_logs%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252023_app_error_in_current_logs%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252223_app_error_in_current_logs%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252023_app_error_in_current_logs%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252223_app_error_in_current_logs%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252023_app_error_in_current_logs%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252223_app_error_in_current_logs%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252023_app_error_in_current_logs%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252223_app_error_in_current_logs%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252023_app_error_in_current_logs%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**24_misconfigured_pvc**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/24_misconfigured_pvc/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252224_misconfigured_pvc%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252024_misconfigured_pvc%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252224_misconfigured_pvc%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252024_misconfigured_pvc%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252224_misconfigured_pvc%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252024_misconfigured_pvc%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252224_misconfigured_pvc%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252024_misconfigured_pvc%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252224_misconfigured_pvc%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252024_misconfigured_pvc%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252224_misconfigured_pvc%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252024_misconfigured_pvc%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**24a_misconfigured_pvc_basic**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/24a_misconfigured_pvc_basic/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252224a_misconfigured_pvc_basic%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252024a_misconfigured_pvc_basic%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252224a_misconfigured_pvc_basic%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252024a_misconfigured_pvc_basic%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252224a_misconfigured_pvc_basic%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252024a_misconfigured_pvc_basic%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252224a_misconfigured_pvc_basic%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252024a_misconfigured_pvc_basic%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252224a_misconfigured_pvc_basic%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252024a_misconfigured_pvc_basic%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252224a_misconfigured_pvc_basic%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252024a_misconfigured_pvc_basic%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**24b_misconfigured_pvc_detailed**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/24b_misconfigured_pvc_detailed/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252224b_misconfigured_pvc_detailed%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252024b_misconfigured_pvc_detailed%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252224b_misconfigured_pvc_detailed%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252024b_misconfigured_pvc_detailed%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252224b_misconfigured_pvc_detailed%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252024b_misconfigured_pvc_detailed%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252224b_misconfigured_pvc_detailed%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252024b_misconfigured_pvc_detailed%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252224b_misconfigured_pvc_detailed%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252024b_misconfigured_pvc_detailed%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252224b_misconfigured_pvc_detailed%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252024b_misconfigured_pvc_detailed%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**25_misconfigured_ingress_class**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/25_misconfigured_ingress_class/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252225_misconfigured_ingress_class%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252025_misconfigured_ingress_class%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252225_misconfigured_ingress_class%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252025_misconfigured_ingress_class%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252225_misconfigured_ingress_class%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252025_misconfigured_ingress_class%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252225_misconfigured_ingress_class%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252025_misconfigured_ingress_class%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252225_misconfigured_ingress_class%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252025_misconfigured_ingress_class%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252225_misconfigured_ingress_class%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252025_misconfigured_ingress_class%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**26_page_render_times**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/26_page_render_times/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252226_page_render_times%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252026_page_render_times%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252226_page_render_times%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252026_page_render_times%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252226_page_render_times%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252026_page_render_times%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252226_page_render_times%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252026_page_render_times%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252226_page_render_times%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252026_page_render_times%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252226_page_render_times%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252026_page_render_times%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**27a_multi_container_logs**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/27a_multi_container_logs/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252227a_multi_container_logs%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252027a_multi_container_logs%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252227a_multi_container_logs%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252027a_multi_container_logs%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252227a_multi_container_logs%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252027a_multi_container_logs%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252227a_multi_container_logs%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252027a_multi_container_logs%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252227a_multi_container_logs%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252027a_multi_container_logs%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252227a_multi_container_logs%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252027a_multi_container_logs%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**27b_multi_container_logs**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/27b_multi_container_logs/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252227b_multi_container_logs%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252027b_multi_container_logs%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252227b_multi_container_logs%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252027b_multi_container_logs%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252227b_multi_container_logs%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252027b_multi_container_logs%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252227b_multi_container_logs%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252027b_multi_container_logs%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252227b_multi_container_logs%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252027b_multi_container_logs%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252227b_multi_container_logs%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252027b_multi_container_logs%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**28_permissions_error**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/28_permissions_error/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252228_permissions_error%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252028_permissions_error%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252228_permissions_error%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252028_permissions_error%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252228_permissions_error%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252028_permissions_error%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252228_permissions_error%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252028_permissions_error%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252228_permissions_error%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252028_permissions_error%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252228_permissions_error%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252028_permissions_error%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**33_cpu_metrics_discovery**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/33_cpu_metrics_discovery/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252233_cpu_metrics_discovery%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252033_cpu_metrics_discovery%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252233_cpu_metrics_discovery%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252033_cpu_metrics_discovery%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252233_cpu_metrics_discovery%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252033_cpu_metrics_discovery%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252233_cpu_metrics_discovery%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252033_cpu_metrics_discovery%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252233_cpu_metrics_discovery%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252033_cpu_metrics_discovery%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252233_cpu_metrics_discovery%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252033_cpu_metrics_discovery%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**39_failed_toolset**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/39_failed_toolset/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252239_failed_toolset%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252039_failed_toolset%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252239_failed_toolset%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252039_failed_toolset%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252239_failed_toolset%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252039_failed_toolset%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252239_failed_toolset%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252039_failed_toolset%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252239_failed_toolset%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252039_failed_toolset%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252239_failed_toolset%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252039_failed_toolset%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**41_setup_argo**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/41_setup_argo/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252241_setup_argo%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252041_setup_argo%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252241_setup_argo%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252041_setup_argo%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252241_setup_argo%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252041_setup_argo%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252241_setup_argo%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252041_setup_argo%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252241_setup_argo%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252041_setup_argo%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252241_setup_argo%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252041_setup_argo%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**42_dns_issues_result_new_tools_no_runbook**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/42_dns_issues_result_new_tools_no_runbook/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252242_dns_issues_result_new_tools_no_runbook%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252042_dns_issues_result_new_tools_no_runbook%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252242_dns_issues_result_new_tools_no_runbook%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252042_dns_issues_result_new_tools_no_runbook%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252242_dns_issues_result_new_tools_no_runbook%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252042_dns_issues_result_new_tools_no_runbook%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252242_dns_issues_result_new_tools_no_runbook%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252042_dns_issues_result_new_tools_no_runbook%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [⏱️](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252242_dns_issues_result_new_tools_no_runbook%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252042_dns_issues_result_new_tools_no_runbook%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252242_dns_issues_result_new_tools_no_runbook%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252042_dns_issues_result_new_tools_no_runbook%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**42_dns_issues_steps_new_tools**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/42_dns_issues_steps_new_tools/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252242_dns_issues_steps_new_tools%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252042_dns_issues_steps_new_tools%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252242_dns_issues_steps_new_tools%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252042_dns_issues_steps_new_tools%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252242_dns_issues_steps_new_tools%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252042_dns_issues_steps_new_tools%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252242_dns_issues_steps_new_tools%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252042_dns_issues_steps_new_tools%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252242_dns_issues_steps_new_tools%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252042_dns_issues_steps_new_tools%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252242_dns_issues_steps_new_tools%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252042_dns_issues_steps_new_tools%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**43_current_datetime_from_prompt**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/43_current_datetime_from_prompt/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252243_current_datetime_from_prompt%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252043_current_datetime_from_prompt%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252243_current_datetime_from_prompt%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252043_current_datetime_from_prompt%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252243_current_datetime_from_prompt%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252043_current_datetime_from_prompt%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252243_current_datetime_from_prompt%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252043_current_datetime_from_prompt%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252243_current_datetime_from_prompt%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252043_current_datetime_from_prompt%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252243_current_datetime_from_prompt%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252043_current_datetime_from_prompt%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**43_slack_deployment_logs**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/43_slack_deployment_logs/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252243_slack_deployment_logs%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252043_slack_deployment_logs%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [⏭️](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252243_slack_deployment_logs%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252043_slack_deployment_logs%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [⏭️](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252243_slack_deployment_logs%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252043_slack_deployment_logs%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [⏭️](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252243_slack_deployment_logs%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252043_slack_deployment_logs%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [⏭️](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252243_slack_deployment_logs%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252043_slack_deployment_logs%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [⏭️](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252243_slack_deployment_logs%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252043_slack_deployment_logs%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**44_slack_statefulset_logs**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/44_slack_statefulset_logs/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252244_slack_statefulset_logs%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252044_slack_statefulset_logs%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”§](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252244_slack_statefulset_logs%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252044_slack_statefulset_logs%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”§](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252244_slack_statefulset_logs%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252044_slack_statefulset_logs%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”§](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252244_slack_statefulset_logs%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252044_slack_statefulset_logs%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”§](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252244_slack_statefulset_logs%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252044_slack_statefulset_logs%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”§](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252244_slack_statefulset_logs%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252044_slack_statefulset_logs%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**45_fetch_deployment_logs_simple**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/45_fetch_deployment_logs_simple/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252245_fetch_deployment_logs_simple%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252045_fetch_deployment_logs_simple%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252245_fetch_deployment_logs_simple%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252045_fetch_deployment_logs_simple%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252245_fetch_deployment_logs_simple%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252045_fetch_deployment_logs_simple%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252245_fetch_deployment_logs_simple%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252045_fetch_deployment_logs_simple%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252245_fetch_deployment_logs_simple%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252045_fetch_deployment_logs_simple%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252245_fetch_deployment_logs_simple%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252045_fetch_deployment_logs_simple%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**48_logs_since_thursday**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/48_logs_since_thursday/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252248_logs_since_thursday%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252048_logs_since_thursday%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”§](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252248_logs_since_thursday%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252048_logs_since_thursday%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”§](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252248_logs_since_thursday%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252048_logs_since_thursday%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”§](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252248_logs_since_thursday%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252048_logs_since_thursday%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”§](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252248_logs_since_thursday%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252048_logs_since_thursday%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”§](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252248_logs_since_thursday%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252048_logs_since_thursday%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**50_logs_since_specific_date**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/50_logs_since_specific_date/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252250_logs_since_specific_date%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252050_logs_since_specific_date%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252250_logs_since_specific_date%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252050_logs_since_specific_date%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252250_logs_since_specific_date%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252050_logs_since_specific_date%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252250_logs_since_specific_date%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252050_logs_since_specific_date%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252250_logs_since_specific_date%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252050_logs_since_specific_date%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252250_logs_since_specific_date%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252050_logs_since_specific_date%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**50a_logs_since_last_specific_month**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/50a_logs_since_last_specific_month/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252250a_logs_since_last_specific_month%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252050a_logs_since_last_specific_month%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252250a_logs_since_last_specific_month%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252050a_logs_since_last_specific_month%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252250a_logs_since_last_specific_month%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252050a_logs_since_last_specific_month%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252250a_logs_since_last_specific_month%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252050a_logs_since_last_specific_month%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252250a_logs_since_last_specific_month%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252050a_logs_since_last_specific_month%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252250a_logs_since_last_specific_month%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252050a_logs_since_last_specific_month%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**51_logs_summarize_errors**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/51_logs_summarize_errors/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252251_logs_summarize_errors%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252051_logs_summarize_errors%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252251_logs_summarize_errors%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252051_logs_summarize_errors%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252251_logs_summarize_errors%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252051_logs_summarize_errors%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252251_logs_summarize_errors%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252051_logs_summarize_errors%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252251_logs_summarize_errors%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252051_logs_summarize_errors%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252251_logs_summarize_errors%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252051_logs_summarize_errors%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**52_logs_login_issues**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/52_logs_login_issues/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252252_logs_login_issues%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252052_logs_login_issues%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252252_logs_login_issues%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252052_logs_login_issues%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252252_logs_login_issues%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252052_logs_login_issues%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252252_logs_login_issues%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252052_logs_login_issues%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252252_logs_login_issues%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252052_logs_login_issues%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252252_logs_login_issues%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252052_logs_login_issues%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**53_logs_find_term**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/53_logs_find_term/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252253_logs_find_term%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252053_logs_find_term%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252253_logs_find_term%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252053_logs_find_term%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252253_logs_find_term%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252053_logs_find_term%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252253_logs_find_term%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252053_logs_find_term%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252253_logs_find_term%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252053_logs_find_term%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252253_logs_find_term%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252053_logs_find_term%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**54_not_truncated_when_getting_pods**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/54_not_truncated_when_getting_pods/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252254_not_truncated_when_getting_pods%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252054_not_truncated_when_getting_pods%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252254_not_truncated_when_getting_pods%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252054_not_truncated_when_getting_pods%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252254_not_truncated_when_getting_pods%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252054_not_truncated_when_getting_pods%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252254_not_truncated_when_getting_pods%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252054_not_truncated_when_getting_pods%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252254_not_truncated_when_getting_pods%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252054_not_truncated_when_getting_pods%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252254_not_truncated_when_getting_pods%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252054_not_truncated_when_getting_pods%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**55_kafka_runbook**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/55_kafka_runbook/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252255_kafka_runbook%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252055_kafka_runbook%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [⏭️](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252255_kafka_runbook%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252055_kafka_runbook%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [⏭️](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252255_kafka_runbook%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252055_kafka_runbook%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [⏭️](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252255_kafka_runbook%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252055_kafka_runbook%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [⏭️](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252255_kafka_runbook%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252055_kafka_runbook%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [⏭️](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252255_kafka_runbook%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252055_kafka_runbook%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**57_wrong_namespace**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/57_wrong_namespace/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252257_wrong_namespace%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252057_wrong_namespace%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252257_wrong_namespace%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252057_wrong_namespace%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252257_wrong_namespace%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252057_wrong_namespace%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252257_wrong_namespace%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252057_wrong_namespace%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252257_wrong_namespace%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252057_wrong_namespace%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252257_wrong_namespace%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252057_wrong_namespace%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**59_label_based_counting**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/59_label_based_counting/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252259_label_based_counting%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252059_label_based_counting%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252259_label_based_counting%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252059_label_based_counting%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252259_label_based_counting%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252059_label_based_counting%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252259_label_based_counting%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252059_label_based_counting%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252259_label_based_counting%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252059_label_based_counting%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252259_label_based_counting%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252059_label_based_counting%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**60_count_less_than**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/60_count_less_than/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252260_count_less_than%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252060_count_less_than%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252260_count_less_than%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252060_count_less_than%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252260_count_less_than%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252060_count_less_than%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252260_count_less_than%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252060_count_less_than%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252260_count_less_than%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252060_count_less_than%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252260_count_less_than%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252060_count_less_than%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**61_exact_match_counting**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/61_exact_match_counting/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252261_exact_match_counting%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252061_exact_match_counting%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252261_exact_match_counting%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252061_exact_match_counting%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252261_exact_match_counting%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252061_exact_match_counting%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252261_exact_match_counting%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252061_exact_match_counting%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252261_exact_match_counting%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252061_exact_match_counting%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252261_exact_match_counting%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252061_exact_match_counting%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**62_fetch_error_logs_with_errors**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/62_fetch_error_logs_with_errors/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252262_fetch_error_logs_with_errors%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252062_fetch_error_logs_with_errors%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252262_fetch_error_logs_with_errors%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252062_fetch_error_logs_with_errors%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252262_fetch_error_logs_with_errors%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252062_fetch_error_logs_with_errors%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252262_fetch_error_logs_with_errors%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252062_fetch_error_logs_with_errors%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252262_fetch_error_logs_with_errors%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252062_fetch_error_logs_with_errors%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252262_fetch_error_logs_with_errors%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252062_fetch_error_logs_with_errors%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**63_fetch_error_logs_no_errors**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/63_fetch_error_logs_no_errors/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252263_fetch_error_logs_no_errors%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252063_fetch_error_logs_no_errors%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252263_fetch_error_logs_no_errors%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252063_fetch_error_logs_no_errors%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252263_fetch_error_logs_no_errors%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252063_fetch_error_logs_no_errors%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252263_fetch_error_logs_no_errors%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252063_fetch_error_logs_no_errors%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252263_fetch_error_logs_no_errors%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252063_fetch_error_logs_no_errors%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252263_fetch_error_logs_no_errors%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252063_fetch_error_logs_no_errors%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**64_keda_vs_hpa_confusion**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/64_keda_vs_hpa_confusion/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252264_keda_vs_hpa_confusion%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252064_keda_vs_hpa_confusion%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252264_keda_vs_hpa_confusion%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252064_keda_vs_hpa_confusion%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252264_keda_vs_hpa_confusion%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252064_keda_vs_hpa_confusion%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252264_keda_vs_hpa_confusion%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252064_keda_vs_hpa_confusion%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252264_keda_vs_hpa_confusion%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252064_keda_vs_hpa_confusion%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252264_keda_vs_hpa_confusion%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252064_keda_vs_hpa_confusion%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**65_health_check_followup**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/65_health_check_followup/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252265_health_check_followup%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252065_health_check_followup%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252265_health_check_followup%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252065_health_check_followup%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252265_health_check_followup%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252065_health_check_followup%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252265_health_check_followup%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252065_health_check_followup%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252265_health_check_followup%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252065_health_check_followup%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252265_health_check_followup%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252065_health_check_followup%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**71_connection_pool_starvation**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/71_connection_pool_starvation/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252271_connection_pool_starvation%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252071_connection_pool_starvation%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252271_connection_pool_starvation%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252071_connection_pool_starvation%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252271_connection_pool_starvation%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252071_connection_pool_starvation%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252271_connection_pool_starvation%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252071_connection_pool_starvation%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252271_connection_pool_starvation%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252071_connection_pool_starvation%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252271_connection_pool_starvation%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252071_connection_pool_starvation%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**73a_time_window_anomaly**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/73a_time_window_anomaly/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252273a_time_window_anomaly%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252073a_time_window_anomaly%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252273a_time_window_anomaly%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252073a_time_window_anomaly%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252273a_time_window_anomaly%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252073a_time_window_anomaly%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252273a_time_window_anomaly%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252073a_time_window_anomaly%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252273a_time_window_anomaly%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252073a_time_window_anomaly%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252273a_time_window_anomaly%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252073a_time_window_anomaly%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**73b_time_window_anomaly**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/73b_time_window_anomaly/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252273b_time_window_anomaly%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252073b_time_window_anomaly%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252273b_time_window_anomaly%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252073b_time_window_anomaly%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252273b_time_window_anomaly%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252073b_time_window_anomaly%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252273b_time_window_anomaly%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252073b_time_window_anomaly%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252273b_time_window_anomaly%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252073b_time_window_anomaly%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252273b_time_window_anomaly%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252073b_time_window_anomaly%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**76_service_discovery_issue**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/76_service_discovery_issue/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252276_service_discovery_issue%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252076_service_discovery_issue%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252276_service_discovery_issue%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252076_service_discovery_issue%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252276_service_discovery_issue%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252076_service_discovery_issue%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252276_service_discovery_issue%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252076_service_discovery_issue%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252276_service_discovery_issue%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252076_service_discovery_issue%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252276_service_discovery_issue%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252076_service_discovery_issue%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**77_liveness_probe_misconfiguration**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/77_liveness_probe_misconfiguration/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252277_liveness_probe_misconfiguration%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252077_liveness_probe_misconfiguration%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252277_liveness_probe_misconfiguration%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252077_liveness_probe_misconfiguration%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252277_liveness_probe_misconfiguration%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252077_liveness_probe_misconfiguration%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252277_liveness_probe_misconfiguration%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252077_liveness_probe_misconfiguration%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252277_liveness_probe_misconfiguration%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252077_liveness_probe_misconfiguration%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252277_liveness_probe_misconfiguration%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252077_liveness_probe_misconfiguration%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**78a_missing_cpu_limits**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/78a_missing_cpu_limits/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252278a_missing_cpu_limits%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252078a_missing_cpu_limits%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252278a_missing_cpu_limits%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252078a_missing_cpu_limits%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252278a_missing_cpu_limits%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252078a_missing_cpu_limits%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252278a_missing_cpu_limits%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252078a_missing_cpu_limits%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252278a_missing_cpu_limits%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252078a_missing_cpu_limits%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252278a_missing_cpu_limits%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252078a_missing_cpu_limits%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**78b_cpu_quota_exceeded**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/78b_cpu_quota_exceeded/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252278b_cpu_quota_exceeded%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252078b_cpu_quota_exceeded%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252278b_cpu_quota_exceeded%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252078b_cpu_quota_exceeded%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252278b_cpu_quota_exceeded%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252078b_cpu_quota_exceeded%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252278b_cpu_quota_exceeded%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252078b_cpu_quota_exceeded%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252278b_cpu_quota_exceeded%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252078b_cpu_quota_exceeded%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252278b_cpu_quota_exceeded%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252078b_cpu_quota_exceeded%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**79_configmap_mount_issue**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/79_configmap_mount_issue/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252279_configmap_mount_issue%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252079_configmap_mount_issue%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252279_configmap_mount_issue%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252079_configmap_mount_issue%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252279_configmap_mount_issue%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252079_configmap_mount_issue%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252279_configmap_mount_issue%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252079_configmap_mount_issue%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252279_configmap_mount_issue%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252079_configmap_mount_issue%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252279_configmap_mount_issue%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252079_configmap_mount_issue%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**80_pvc_storage_class_mismatch**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/80_pvc_storage_class_mismatch/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252280_pvc_storage_class_mismatch%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252080_pvc_storage_class_mismatch%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252280_pvc_storage_class_mismatch%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252080_pvc_storage_class_mismatch%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252280_pvc_storage_class_mismatch%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252080_pvc_storage_class_mismatch%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252280_pvc_storage_class_mismatch%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252080_pvc_storage_class_mismatch%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252280_pvc_storage_class_mismatch%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252080_pvc_storage_class_mismatch%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252280_pvc_storage_class_mismatch%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252080_pvc_storage_class_mismatch%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**81_service_account_permission_denied**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/81_service_account_permission_denied/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252281_service_account_permission_denied%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252081_service_account_permission_denied%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252281_service_account_permission_denied%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252081_service_account_permission_denied%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252281_service_account_permission_denied%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252081_service_account_permission_denied%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252281_service_account_permission_denied%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252081_service_account_permission_denied%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252281_service_account_permission_denied%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252081_service_account_permission_denied%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252281_service_account_permission_denied%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252081_service_account_permission_denied%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**82_pod_anti_affinity_conflict**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/82_pod_anti_affinity_conflict/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252282_pod_anti_affinity_conflict%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252082_pod_anti_affinity_conflict%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252282_pod_anti_affinity_conflict%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252082_pod_anti_affinity_conflict%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252282_pod_anti_affinity_conflict%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252082_pod_anti_affinity_conflict%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252282_pod_anti_affinity_conflict%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252082_pod_anti_affinity_conflict%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252282_pod_anti_affinity_conflict%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252082_pod_anti_affinity_conflict%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252282_pod_anti_affinity_conflict%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252082_pod_anti_affinity_conflict%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**83_secret_not_found**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/83_secret_not_found/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252283_secret_not_found%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252083_secret_not_found%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252283_secret_not_found%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252083_secret_not_found%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252283_secret_not_found%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252083_secret_not_found%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252283_secret_not_found%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252083_secret_not_found%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252283_secret_not_found%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252083_secret_not_found%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252283_secret_not_found%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252083_secret_not_found%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**84_network_policy_blocking_traffic**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/84_network_policy_blocking_traffic/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252284_network_policy_blocking_traffic%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252084_network_policy_blocking_traffic%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252284_network_policy_blocking_traffic%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252084_network_policy_blocking_traffic%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252284_network_policy_blocking_traffic%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252084_network_policy_blocking_traffic%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252284_network_policy_blocking_traffic%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252084_network_policy_blocking_traffic%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252284_network_policy_blocking_traffic%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252084_network_policy_blocking_traffic%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252284_network_policy_blocking_traffic%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252084_network_policy_blocking_traffic%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**85_hpa_not_scaling**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/85_hpa_not_scaling/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252285_hpa_not_scaling%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252085_hpa_not_scaling%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252285_hpa_not_scaling%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252085_hpa_not_scaling%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252285_hpa_not_scaling%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252085_hpa_not_scaling%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252285_hpa_not_scaling%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252085_hpa_not_scaling%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252285_hpa_not_scaling%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252085_hpa_not_scaling%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252285_hpa_not_scaling%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252085_hpa_not_scaling%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**86_configmap_like_but_secret**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/86_configmap_like_but_secret/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252286_configmap_like_but_secret%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252086_configmap_like_but_secret%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252286_configmap_like_but_secret%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252086_configmap_like_but_secret%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252286_configmap_like_but_secret%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252086_configmap_like_but_secret%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252286_configmap_like_but_secret%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252086_configmap_like_but_secret%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252286_configmap_like_but_secret%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252086_configmap_like_but_secret%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252286_configmap_like_but_secret%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252086_configmap_like_but_secret%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**89_runbook_missing_cloudwatch**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/89_runbook_missing_cloudwatch/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252289_runbook_missing_cloudwatch%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252089_runbook_missing_cloudwatch%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252289_runbook_missing_cloudwatch%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252089_runbook_missing_cloudwatch%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252289_runbook_missing_cloudwatch%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252089_runbook_missing_cloudwatch%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252289_runbook_missing_cloudwatch%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252089_runbook_missing_cloudwatch%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252289_runbook_missing_cloudwatch%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252089_runbook_missing_cloudwatch%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252289_runbook_missing_cloudwatch%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252089_runbook_missing_cloudwatch%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**90_runbook_basic_selection**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/90_runbook_basic_selection/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252290_runbook_basic_selection%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252090_runbook_basic_selection%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252290_runbook_basic_selection%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252090_runbook_basic_selection%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252290_runbook_basic_selection%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252090_runbook_basic_selection%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252290_runbook_basic_selection%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252090_runbook_basic_selection%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252290_runbook_basic_selection%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252090_runbook_basic_selection%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252290_runbook_basic_selection%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252090_runbook_basic_selection%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**91f_datadog_logs_historical_pod**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/91f_datadog_logs_historical_pod/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252291f_datadog_logs_historical_pod%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252091f_datadog_logs_historical_pod%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252291f_datadog_logs_historical_pod%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252091f_datadog_logs_historical_pod%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252291f_datadog_logs_historical_pod%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252091f_datadog_logs_historical_pod%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252291f_datadog_logs_historical_pod%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252091f_datadog_logs_historical_pod%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252291f_datadog_logs_historical_pod%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252091f_datadog_logs_historical_pod%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252291f_datadog_logs_historical_pod%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252091f_datadog_logs_historical_pod%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**93_calling_datadog[0]**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/93_calling_datadog[0]/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252293_calling_datadog%255B0%255D%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252093_calling_datadog%255B0%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252293_calling_datadog%255B0%255D%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252093_calling_datadog%255B0%255D%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252293_calling_datadog%255B0%255D%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252093_calling_datadog%255B0%255D%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252293_calling_datadog%255B0%255D%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252093_calling_datadog%255B0%255D%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [⏱️](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252293_calling_datadog%255B0%255D%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252093_calling_datadog%255B0%255D%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252293_calling_datadog%255B0%255D%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252093_calling_datadog%255B0%255D%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**93_calling_datadog[1]**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/93_calling_datadog[1]/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252293_calling_datadog%255B1%255D%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252093_calling_datadog%255B1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252293_calling_datadog%255B1%255D%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252093_calling_datadog%255B1%255D%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252293_calling_datadog%255B1%255D%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252093_calling_datadog%255B1%255D%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252293_calling_datadog%255B1%255D%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252093_calling_datadog%255B1%255D%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252293_calling_datadog%255B1%255D%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252093_calling_datadog%255B1%255D%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252293_calling_datadog%255B1%255D%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252093_calling_datadog%255B1%255D%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**93_calling_datadog[2]**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/93_calling_datadog[2]/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252293_calling_datadog%255B2%255D%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252093_calling_datadog%255B2%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252293_calling_datadog%255B2%255D%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252093_calling_datadog%255B2%255D%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252293_calling_datadog%255B2%255D%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252093_calling_datadog%255B2%255D%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252293_calling_datadog%255B2%255D%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252093_calling_datadog%255B2%255D%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252293_calling_datadog%255B2%255D%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252093_calling_datadog%255B2%255D%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252293_calling_datadog%255B2%255D%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252093_calling_datadog%255B2%255D%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**93_events_since_specific_date**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/93_events_since_specific_date/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252293_events_since_specific_date%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252093_events_since_specific_date%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252293_events_since_specific_date%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252093_events_since_specific_date%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252293_events_since_specific_date%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252093_events_since_specific_date%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”§](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252293_events_since_specific_date%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252093_events_since_specific_date%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252293_events_since_specific_date%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252093_events_since_specific_date%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252293_events_since_specific_date%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252093_events_since_specific_date%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**94_runbook_transparency**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/94_runbook_transparency/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252294_runbook_transparency%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252094_runbook_transparency%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252294_runbook_transparency%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252094_runbook_transparency%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252294_runbook_transparency%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252094_runbook_transparency%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252294_runbook_transparency%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252094_runbook_transparency%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252294_runbook_transparency%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252094_runbook_transparency%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252294_runbook_transparency%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252094_runbook_transparency%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**96_no_matching_runbook**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/96_no_matching_runbook/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252296_no_matching_runbook%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252096_no_matching_runbook%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252296_no_matching_runbook%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252096_no_matching_runbook%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252296_no_matching_runbook%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252096_no_matching_runbook%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252296_no_matching_runbook%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252096_no_matching_runbook%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252296_no_matching_runbook%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252096_no_matching_runbook%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252296_no_matching_runbook%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252096_no_matching_runbook%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**97_logs_clarification_needed**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/97_logs_clarification_needed/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252297_logs_clarification_needed%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252097_logs_clarification_needed%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252297_logs_clarification_needed%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252097_logs_clarification_needed%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252297_logs_clarification_needed%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252097_logs_clarification_needed%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252297_logs_clarification_needed%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252097_logs_clarification_needed%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252297_logs_clarification_needed%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252097_logs_clarification_needed%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252297_logs_clarification_needed%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252097_logs_clarification_needed%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**98_logs_transparency_default_time**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/98_logs_transparency_default_time/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252298_logs_transparency_default_time%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252098_logs_transparency_default_time%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [⏭️](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252298_logs_transparency_default_time%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252098_logs_transparency_default_time%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [⏭️](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252298_logs_transparency_default_time%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252098_logs_transparency_default_time%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [⏭️](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252298_logs_transparency_default_time%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252098_logs_transparency_default_time%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [⏭️](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252298_logs_transparency_default_time%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252098_logs_transparency_default_time%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [⏭️](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252298_logs_transparency_default_time%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252098_logs_transparency_default_time%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**99_logs_transparency_custom_time**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/99_logs_transparency_custom_time/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252299_logs_transparency_custom_time%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252099_logs_transparency_custom_time%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252299_logs_transparency_custom_time%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252099_logs_transparency_custom_time%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252299_logs_transparency_custom_time%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252099_logs_transparency_custom_time%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252299_logs_transparency_custom_time%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252099_logs_transparency_custom_time%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252299_logs_transparency_custom_time%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252099_logs_transparency_custom_time%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252299_logs_transparency_custom_time%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252099_logs_transparency_custom_time%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| **SUMMARY** | 🟑 57% (54/94) | 🟑 74% (70/94) | 🟑 84% (78/93) | 🟑 85% (80/94) | 🟑 95% (89/94) | + +## Detailed Raw Results + +| Eval ID | gpt-4o | gpt-4.1 | gpt-5 | sonnet-4-20250514 | sonnet-4-5-20250929 | +|---------|-------|-------|-------|-------|-------| +| [01_how_many_pods](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/01_how_many_pods/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252201_how_many_pods%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252001_how_many_pods%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252201_how_many_pods%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252001_how_many_pods%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 23.0s / πŸ’° $0.08 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252201_how_many_pods%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252001_how_many_pods%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 36.7s / πŸ’° $0.06 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252201_how_many_pods%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252001_how_many_pods%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 45.8s / πŸ’° $0.03 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252201_how_many_pods%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252001_how_many_pods%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 22.9s / πŸ’° $0.07 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252201_how_many_pods%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252001_how_many_pods%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 25.3s / πŸ’° $0.07 | +| [02_what_is_wrong_with_pod](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/02_what_is_wrong_with_pod/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252202_what_is_wrong_with_pod%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252002_what_is_wrong_with_pod%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252202_what_is_wrong_with_pod%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252002_what_is_wrong_with_pod%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 35.0s / πŸ’° $0.08 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252202_what_is_wrong_with_pod%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252002_what_is_wrong_with_pod%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 30.6s / πŸ’° $0.06 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252202_what_is_wrong_with_pod%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252002_what_is_wrong_with_pod%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 97.4s / πŸ’° $0.08 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252202_what_is_wrong_with_pod%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252002_what_is_wrong_with_pod%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 40.8s / πŸ’° $0.10 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252202_what_is_wrong_with_pod%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252002_what_is_wrong_with_pod%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 74.7s / πŸ’° $0.20 | +| [03_what_is_the_command_to_port_forward](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/03_what_is_the_command_to_port_forward/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252203_what_is_the_command_to_port_forward%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252003_what_is_the_command_to_port_forward%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252203_what_is_the_command_to_port_forward%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252003_what_is_the_command_to_port_forward%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 37.5s / πŸ’° $0.15 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252203_what_is_the_command_to_port_forward%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252003_what_is_the_command_to_port_forward%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 33.4s / πŸ’° $0.15 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252203_what_is_the_command_to_port_forward%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252003_what_is_the_command_to_port_forward%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 91.2s / πŸ’° $0.06 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252203_what_is_the_command_to_port_forward%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252003_what_is_the_command_to_port_forward%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 33.1s / πŸ’° $0.11 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252203_what_is_the_command_to_port_forward%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252003_what_is_the_command_to_port_forward%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 33.1s / πŸ’° $0.09 | +| [04_related_k8s_events](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/04_related_k8s_events/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252204_related_k8s_events%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252004_related_k8s_events%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252204_related_k8s_events%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252004_related_k8s_events%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 25.4s / πŸ’° $0.10 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252204_related_k8s_events%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252004_related_k8s_events%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 28.3s / πŸ’° $0.06 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252204_related_k8s_events%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252004_related_k8s_events%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 63.5s / πŸ’° $0.04 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252204_related_k8s_events%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252004_related_k8s_events%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 33.0s / πŸ’° $0.08 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252204_related_k8s_events%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252004_related_k8s_events%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 32.5s / πŸ’° $0.08 | +| [05_image_version](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/05_image_version/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252205_image_version%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252005_image_version%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252205_image_version%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252005_image_version%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 32.6s / πŸ’° $0.12 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252205_image_version%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252005_image_version%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 22.1s / πŸ’° $0.04 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252205_image_version%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252005_image_version%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 46.8s / πŸ’° $0.05 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252205_image_version%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252005_image_version%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 31.2s / πŸ’° $0.08 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252205_image_version%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252005_image_version%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 33.7s / πŸ’° $0.09 | +| [08_sock_shop_frontend](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/08_sock_shop_frontend/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252208_sock_shop_frontend%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252008_sock_shop_frontend%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252208_sock_shop_frontend%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252008_sock_shop_frontend%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252208_sock_shop_frontend%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252008_sock_shop_frontend%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252208_sock_shop_frontend%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252008_sock_shop_frontend%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252208_sock_shop_frontend%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252008_sock_shop_frontend%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252208_sock_shop_frontend%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252008_sock_shop_frontend%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [09_crashpod](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/09_crashpod/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252209_crashpod%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252009_crashpod%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252209_crashpod%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252009_crashpod%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 37.2s / πŸ’° $0.11 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252209_crashpod%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252009_crashpod%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 31.1s / πŸ’° $0.06 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252209_crashpod%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252009_crashpod%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 125.4s / πŸ’° $0.07 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252209_crashpod%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252009_crashpod%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 52.2s / πŸ’° $0.13 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252209_crashpod%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252009_crashpod%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 52.4s / πŸ’° $0.13 | +| [100a_historical_logs](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/100a_historical_logs/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522100a_historical_logs%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520100a_historical_logs%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522100a_historical_logs%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520100a_historical_logs%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 50.7s / πŸ’° $0.11 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522100a_historical_logs%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520100a_historical_logs%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 42.4s / πŸ’° $0.08 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522100a_historical_logs%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520100a_historical_logs%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 338.5s / πŸ’° $0.18 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522100a_historical_logs%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520100a_historical_logs%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 50.4s / πŸ’° $0.12 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522100a_historical_logs%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520100a_historical_logs%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 208.6s / πŸ’° $0.28 | +| [100b_historical_logs_nonstandard_label](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/100b_historical_logs_nonstandard_label/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522100b_historical_logs_nonstandard_label%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520100b_historical_logs_nonstandard_label%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522100b_historical_logs_nonstandard_label%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520100b_historical_logs_nonstandard_label%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 45.3s / πŸ’° $0.11 | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522100b_historical_logs_nonstandard_label%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520100b_historical_logs_nonstandard_label%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 37.9s / πŸ’° $0.06 | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522100b_historical_logs_nonstandard_label%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520100b_historical_logs_nonstandard_label%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 393.7s / πŸ’° $0.31 | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522100b_historical_logs_nonstandard_label%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520100b_historical_logs_nonstandard_label%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 124.7s / πŸ’° $0.15 | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522100b_historical_logs_nonstandard_label%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520100b_historical_logs_nonstandard_label%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 143.2s / πŸ’° $0.26 | +| [101_historical_logs_pod_deleted](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/101_historical_logs_pod_deleted/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522101_historical_logs_pod_deleted%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520101_historical_logs_pod_deleted%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522101_historical_logs_pod_deleted%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520101_historical_logs_pod_deleted%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 34.4s / πŸ’° $0.12 | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522101_historical_logs_pod_deleted%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520101_historical_logs_pod_deleted%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 26.2s / πŸ’° $0.05 | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522101_historical_logs_pod_deleted%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520101_historical_logs_pod_deleted%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 435.8s / πŸ’° $0.30 | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522101_historical_logs_pod_deleted%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520101_historical_logs_pod_deleted%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 103.1s / πŸ’° $0.20 | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522101_historical_logs_pod_deleted%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520101_historical_logs_pod_deleted%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 69.5s / πŸ’° $0.13 | +| [103_logs_transparency_default_limit](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/103_logs_transparency_default_limit/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522103_logs_transparency_default_limit%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520103_logs_transparency_default_limit%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522103_logs_transparency_default_limit%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520103_logs_transparency_default_limit%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 29.4s / πŸ’° $0.18 | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522103_logs_transparency_default_limit%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520103_logs_transparency_default_limit%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 42.1s / πŸ’° $0.43 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522103_logs_transparency_default_limit%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520103_logs_transparency_default_limit%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 131.9s / πŸ’° $0.09 | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522103_logs_transparency_default_limit%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520103_logs_transparency_default_limit%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 45.1s / πŸ’° $0.40 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522103_logs_transparency_default_limit%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520103_logs_transparency_default_limit%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 46.4s / πŸ’° $0.22 | +| [104a_postgres_root_issue](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/104a_postgres_root_issue/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522104a_postgres_root_issue%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520104a_postgres_root_issue%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522104a_postgres_root_issue%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520104a_postgres_root_issue%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 41.6s / πŸ’° $0.17 | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522104a_postgres_root_issue%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520104a_postgres_root_issue%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 41.3s / πŸ’° $0.13 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522104a_postgres_root_issue%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520104a_postgres_root_issue%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 273.0s / πŸ’° $0.22 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522104a_postgres_root_issue%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520104a_postgres_root_issue%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 66.8s / πŸ’° $0.21 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522104a_postgres_root_issue%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520104a_postgres_root_issue%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 82.2s / πŸ’° $0.23 | +| [104b_postgres_missing_index_pgstat](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/104b_postgres_missing_index_pgstat/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522104b_postgres_missing_index_pgstat%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520104b_postgres_missing_index_pgstat%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522104b_postgres_missing_index_pgstat%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520104b_postgres_missing_index_pgstat%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522104b_postgres_missing_index_pgstat%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520104b_postgres_missing_index_pgstat%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522104b_postgres_missing_index_pgstat%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520104b_postgres_missing_index_pgstat%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522104b_postgres_missing_index_pgstat%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520104b_postgres_missing_index_pgstat%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522104b_postgres_missing_index_pgstat%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520104b_postgres_missing_index_pgstat%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [104c_postgres_minimal_missing_index](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/104c_postgres_minimal_missing_index/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522104c_postgres_minimal_missing_index%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520104c_postgres_minimal_missing_index%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522104c_postgres_minimal_missing_index%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520104c_postgres_minimal_missing_index%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522104c_postgres_minimal_missing_index%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520104c_postgres_minimal_missing_index%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522104c_postgres_minimal_missing_index%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520104c_postgres_minimal_missing_index%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522104c_postgres_minimal_missing_index%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520104c_postgres_minimal_missing_index%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522104c_postgres_minimal_missing_index%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520104c_postgres_minimal_missing_index%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [105_redis_wrong_data_structure](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/105_redis_wrong_data_structure/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522105_redis_wrong_data_structure%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520105_redis_wrong_data_structure%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522105_redis_wrong_data_structure%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520105_redis_wrong_data_structure%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522105_redis_wrong_data_structure%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520105_redis_wrong_data_structure%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522105_redis_wrong_data_structure%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520105_redis_wrong_data_structure%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522105_redis_wrong_data_structure%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520105_redis_wrong_data_structure%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522105_redis_wrong_data_structure%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520105_redis_wrong_data_structure%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [107_log_filter_http_status_code](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/107_log_filter_http_status_code/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522107_log_filter_http_status_code%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520107_log_filter_http_status_code%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522107_log_filter_http_status_code%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520107_log_filter_http_status_code%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 46.6s / πŸ’° $0.20 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522107_log_filter_http_status_code%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520107_log_filter_http_status_code%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 56.7s / πŸ’° $0.12 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522107_log_filter_http_status_code%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520107_log_filter_http_status_code%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 881.1s / πŸ’° $0.37 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522107_log_filter_http_status_code%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520107_log_filter_http_status_code%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 72.0s / πŸ’° $0.22 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522107_log_filter_http_status_code%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520107_log_filter_http_status_code%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 95.4s / πŸ’° $0.34 | +| [108_logs_nearby_lines](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/108_logs_nearby_lines/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522108_logs_nearby_lines%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520108_logs_nearby_lines%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522108_logs_nearby_lines%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520108_logs_nearby_lines%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 57.6s / πŸ’° $0.22 | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522108_logs_nearby_lines%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520108_logs_nearby_lines%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 45.9s / πŸ’° $0.17 | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522108_logs_nearby_lines%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520108_logs_nearby_lines%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 227.1s / πŸ’° $0.21 | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522108_logs_nearby_lines%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520108_logs_nearby_lines%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 77.0s / πŸ’° $0.37 | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522108_logs_nearby_lines%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520108_logs_nearby_lines%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 96.7s / πŸ’° $0.23 | +| [109_logs_transparency_not_found](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/109_logs_transparency_not_found/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522109_logs_transparency_not_found%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520109_logs_transparency_not_found%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522109_logs_transparency_not_found%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520109_logs_transparency_not_found%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 28.5s / πŸ’° $0.10 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522109_logs_transparency_not_found%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520109_logs_transparency_not_found%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 29.7s / πŸ’° $0.08 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522109_logs_transparency_not_found%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520109_logs_transparency_not_found%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 100.1s / πŸ’° $0.07 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522109_logs_transparency_not_found%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520109_logs_transparency_not_found%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 46.5s / πŸ’° $0.10 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522109_logs_transparency_not_found%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520109_logs_transparency_not_found%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 38.0s / πŸ’° $0.10 | +| [10_image_pull_backoff](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/10_image_pull_backoff/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252210_image_pull_backoff%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252010_image_pull_backoff%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252210_image_pull_backoff%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252010_image_pull_backoff%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 46.2s / πŸ’° $0.19 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252210_image_pull_backoff%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252010_image_pull_backoff%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 40.8s / πŸ’° $0.08 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252210_image_pull_backoff%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252010_image_pull_backoff%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 153.7s / πŸ’° $0.18 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252210_image_pull_backoff%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252010_image_pull_backoff%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 48.9s / πŸ’° $0.15 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252210_image_pull_backoff%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252010_image_pull_backoff%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 46.3s / πŸ’° $0.12 | +| [110_k8s_events_image_pull](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/110_k8s_events_image_pull/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522110_k8s_events_image_pull%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520110_k8s_events_image_pull%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522110_k8s_events_image_pull%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520110_k8s_events_image_pull%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 29.6s / πŸ’° $0.09 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522110_k8s_events_image_pull%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520110_k8s_events_image_pull%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 28.1s / πŸ’° $0.06 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522110_k8s_events_image_pull%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520110_k8s_events_image_pull%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 76.4s / πŸ’° $0.06 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522110_k8s_events_image_pull%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520110_k8s_events_image_pull%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 35.3s / πŸ’° $0.09 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522110_k8s_events_image_pull%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520110_k8s_events_image_pull%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 79.7s / πŸ’° $0.15 | +| [111_disabled_datadog_traces](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/111_disabled_datadog_traces/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522111_disabled_datadog_traces%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520111_disabled_datadog_traces%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522111_disabled_datadog_traces%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520111_disabled_datadog_traces%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 15.0s / πŸ’° $0.03 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522111_disabled_datadog_traces%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520111_disabled_datadog_traces%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 26.8s / πŸ’° $0.06 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522111_disabled_datadog_traces%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520111_disabled_datadog_traces%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 356.5s / πŸ’° $0.28 | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522111_disabled_datadog_traces%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520111_disabled_datadog_traces%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 100.3s / πŸ’° $0.17 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522111_disabled_datadog_traces%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520111_disabled_datadog_traces%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 22.1s / πŸ’° $0.06 | +| [111_pod_names_contain_service](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/111_pod_names_contain_service/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522111_pod_names_contain_service%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520111_pod_names_contain_service%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522111_pod_names_contain_service%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520111_pod_names_contain_service%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 40.5s / πŸ’° $0.20 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522111_pod_names_contain_service%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520111_pod_names_contain_service%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 56.9s / πŸ’° $0.21 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522111_pod_names_contain_service%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520111_pod_names_contain_service%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 237.4s / πŸ’° $0.16 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522111_pod_names_contain_service%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520111_pod_names_contain_service%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 65.2s / πŸ’° $0.22 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522111_pod_names_contain_service%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520111_pod_names_contain_service%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 73.4s / πŸ’° $0.67 | +| [112_find_pvcs_by_uuid](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/112_find_pvcs_by_uuid/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522112_find_pvcs_by_uuid%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520112_find_pvcs_by_uuid%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522112_find_pvcs_by_uuid%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520112_find_pvcs_by_uuid%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 23.6s / πŸ’° $0.08 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522112_find_pvcs_by_uuid%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520112_find_pvcs_by_uuid%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 33.5s / πŸ’° $0.12 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522112_find_pvcs_by_uuid%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520112_find_pvcs_by_uuid%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 159.1s / πŸ’° $0.11 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522112_find_pvcs_by_uuid%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520112_find_pvcs_by_uuid%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 37.4s / πŸ’° $0.11 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522112_find_pvcs_by_uuid%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520112_find_pvcs_by_uuid%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 35.0s / πŸ’° $0.11 | +| [114_checkout_latency_tracing_rebuild[0]](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/114_checkout_latency_tracing_rebuild[0]/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522114_checkout_latency_tracing_rebuild%255B0%255D%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520114_checkout_latency_tracing_rebuild%255B0%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522114_checkout_latency_tracing_rebuild%255B0%255D%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520114_checkout_latency_tracing_rebuild%255B0%255D%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 44.3s / πŸ’° $0.16 | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522114_checkout_latency_tracing_rebuild%255B0%255D%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520114_checkout_latency_tracing_rebuild%255B0%255D%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 37.3s / πŸ’° $0.13 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522114_checkout_latency_tracing_rebuild%255B0%255D%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520114_checkout_latency_tracing_rebuild%255B0%255D%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 325.8s / πŸ’° $0.24 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522114_checkout_latency_tracing_rebuild%255B0%255D%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520114_checkout_latency_tracing_rebuild%255B0%255D%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 85.3s / πŸ’° $0.38 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522114_checkout_latency_tracing_rebuild%255B0%255D%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520114_checkout_latency_tracing_rebuild%255B0%255D%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 111.1s / πŸ’° $0.35 | +| [115_checkout_errors_tracing[0]](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/115_checkout_errors_tracing[0]/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522115_checkout_errors_tracing%255B0%255D%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520115_checkout_errors_tracing%255B0%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522115_checkout_errors_tracing%255B0%255D%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520115_checkout_errors_tracing%255B0%255D%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 52.0s / πŸ’° $0.24 | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522115_checkout_errors_tracing%255B0%255D%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520115_checkout_errors_tracing%255B0%255D%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 38.8s / πŸ’° $0.78 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522115_checkout_errors_tracing%255B0%255D%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520115_checkout_errors_tracing%255B0%255D%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 170.0s / πŸ’° $0.14 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522115_checkout_errors_tracing%255B0%255D%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520115_checkout_errors_tracing%255B0%255D%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 95.7s / πŸ’° $0.35 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522115_checkout_errors_tracing%255B0%255D%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520115_checkout_errors_tracing%255B0%255D%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 117.4s / πŸ’° $0.37 | +| [11_init_containers](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/11_init_containers/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252211_init_containers%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252011_init_containers%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252211_init_containers%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252011_init_containers%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 43.1s / πŸ’° $0.11 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252211_init_containers%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252011_init_containers%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 42.0s / πŸ’° $0.08 | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252211_init_containers%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252011_init_containers%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 23.6s / πŸ’° $0.02 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252211_init_containers%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252011_init_containers%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 41.3s / πŸ’° $0.10 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252211_init_containers%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252011_init_containers%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 54.9s / πŸ’° $0.13 | +| [121_new_relic_checkout_errors_tracing[0]](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/121_new_relic_checkout_errors_tracing[0]/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522121_new_relic_checkout_errors_tracing%255B0%255D%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520121_new_relic_checkout_errors_tracing%255B0%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522121_new_relic_checkout_errors_tracing%255B0%255D%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520121_new_relic_checkout_errors_tracing%255B0%255D%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 21.8s / πŸ’° $0.07 | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522121_new_relic_checkout_errors_tracing%255B0%255D%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520121_new_relic_checkout_errors_tracing%255B0%255D%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 15.7s / πŸ’° $0.03 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522121_new_relic_checkout_errors_tracing%255B0%255D%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520121_new_relic_checkout_errors_tracing%255B0%255D%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 446.7s / πŸ’° $0.31 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522121_new_relic_checkout_errors_tracing%255B0%255D%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520121_new_relic_checkout_errors_tracing%255B0%255D%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 96.3s / πŸ’° $0.36 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522121_new_relic_checkout_errors_tracing%255B0%255D%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520121_new_relic_checkout_errors_tracing%255B0%255D%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 151.5s / πŸ’° $0.44 | +| [122_new_relic_checkout_latency_tracing_rebuild[0]](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/122_new_relic_checkout_latency_tracing_rebuild[0]/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522122_new_relic_checkout_latency_tracing_rebuild%255B0%255D%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520122_new_relic_checkout_latency_tracing_rebuild%255B0%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522122_new_relic_checkout_latency_tracing_rebuild%255B0%255D%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520122_new_relic_checkout_latency_tracing_rebuild%255B0%255D%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 37.3s / πŸ’° $0.15 | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522122_new_relic_checkout_latency_tracing_rebuild%255B0%255D%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520122_new_relic_checkout_latency_tracing_rebuild%255B0%255D%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 81.6s / πŸ’° $0.19 | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522122_new_relic_checkout_latency_tracing_rebuild%255B0%255D%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520122_new_relic_checkout_latency_tracing_rebuild%255B0%255D%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 463.4s / πŸ’° $0.31 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522122_new_relic_checkout_latency_tracing_rebuild%255B0%255D%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520122_new_relic_checkout_latency_tracing_rebuild%255B0%255D%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 96.3s / πŸ’° $0.33 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522122_new_relic_checkout_latency_tracing_rebuild%255B0%255D%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520122_new_relic_checkout_latency_tracing_rebuild%255B0%255D%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 141.0s / πŸ’° $0.44 | +| [123_new_relic_checkout_errors_tracing[0]](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/123_new_relic_checkout_errors_tracing[0]/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522123_new_relic_checkout_errors_tracing%255B0%255D%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520123_new_relic_checkout_errors_tracing%255B0%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522123_new_relic_checkout_errors_tracing%255B0%255D%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520123_new_relic_checkout_errors_tracing%255B0%255D%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 26.1s / πŸ’° $0.07 | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522123_new_relic_checkout_errors_tracing%255B0%255D%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520123_new_relic_checkout_errors_tracing%255B0%255D%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 16.6s / πŸ’° $0.03 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522123_new_relic_checkout_errors_tracing%255B0%255D%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520123_new_relic_checkout_errors_tracing%255B0%255D%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 249.2s / πŸ’° $0.19 | [⏱️ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522123_new_relic_checkout_errors_tracing%255B0%255D%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520123_new_relic_checkout_errors_tracing%255B0%255D%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 617.6s | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522123_new_relic_checkout_errors_tracing%255B0%255D%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520123_new_relic_checkout_errors_tracing%255B0%255D%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 102.8s / πŸ’° $0.46 | +| [12_job_crashing](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/12_job_crashing/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252212_job_crashing%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252012_job_crashing%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252212_job_crashing%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252012_job_crashing%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 31.3s / πŸ’° $0.13 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252212_job_crashing%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252012_job_crashing%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 39.2s / πŸ’° $0.10 | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252212_job_crashing%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252012_job_crashing%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 137.1s / πŸ’° $0.09 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252212_job_crashing%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252012_job_crashing%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 59.3s / πŸ’° $0.13 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252212_job_crashing%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252012_job_crashing%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 59.2s / πŸ’° $0.14 | +| [13a_pending_node_selector_basic](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/13a_pending_node_selector_basic/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252213a_pending_node_selector_basic%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252013a_pending_node_selector_basic%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252213a_pending_node_selector_basic%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252013a_pending_node_selector_basic%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 36.0s / πŸ’° $0.14 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252213a_pending_node_selector_basic%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252013a_pending_node_selector_basic%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 39.0s / πŸ’° $0.08 | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252213a_pending_node_selector_basic%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252013a_pending_node_selector_basic%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 23.8s / πŸ’° $0.02 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252213a_pending_node_selector_basic%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252013a_pending_node_selector_basic%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 54.1s / πŸ’° $0.14 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252213a_pending_node_selector_basic%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252013a_pending_node_selector_basic%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 52.7s / πŸ’° $0.13 | +| [13b_pending_node_selector_detailed](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/13b_pending_node_selector_detailed/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252213b_pending_node_selector_detailed%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252013b_pending_node_selector_detailed%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252213b_pending_node_selector_detailed%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252013b_pending_node_selector_detailed%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 39.0s / πŸ’° $0.14 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252213b_pending_node_selector_detailed%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252013b_pending_node_selector_detailed%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 31.1s / πŸ’° $0.07 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252213b_pending_node_selector_detailed%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252013b_pending_node_selector_detailed%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 141.0s / πŸ’° $0.12 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252213b_pending_node_selector_detailed%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252013b_pending_node_selector_detailed%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 58.8s / πŸ’° $0.15 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252213b_pending_node_selector_detailed%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252013b_pending_node_selector_detailed%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 56.8s / πŸ’° $0.15 | +| [14_pending_resources](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/14_pending_resources/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252214_pending_resources%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252014_pending_resources%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252214_pending_resources%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252014_pending_resources%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 40.8s / πŸ’° $0.14 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252214_pending_resources%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252014_pending_resources%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 48.0s / πŸ’° $0.10 | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252214_pending_resources%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252014_pending_resources%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 24.5s / πŸ’° $0.02 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252214_pending_resources%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252014_pending_resources%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 69.1s / πŸ’° $0.14 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252214_pending_resources%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252014_pending_resources%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 58.8s / πŸ’° $0.13 | +| [156_kafka_opensearch_latency](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/156_kafka_opensearch_latency/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522156_kafka_opensearch_latency%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520156_kafka_opensearch_latency%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522156_kafka_opensearch_latency%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520156_kafka_opensearch_latency%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522156_kafka_opensearch_latency%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520156_kafka_opensearch_latency%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522156_kafka_opensearch_latency%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520156_kafka_opensearch_latency%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522156_kafka_opensearch_latency%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520156_kafka_opensearch_latency%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522156_kafka_opensearch_latency%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520156_kafka_opensearch_latency%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [159_prometheus_high_cardinality_cpu[0]](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/159_prometheus_high_cardinality_cpu[0]/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522159_prometheus_high_cardinality_cpu%255B0%255D%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520159_prometheus_high_cardinality_cpu%255B0%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522159_prometheus_high_cardinality_cpu%255B0%255D%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520159_prometheus_high_cardinality_cpu%255B0%255D%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 37.1s / πŸ’° $0.19 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522159_prometheus_high_cardinality_cpu%255B0%255D%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520159_prometheus_high_cardinality_cpu%255B0%255D%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 41.2s / πŸ’° $0.57 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522159_prometheus_high_cardinality_cpu%255B0%255D%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520159_prometheus_high_cardinality_cpu%255B0%255D%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 261.2s / πŸ’° $0.19 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522159_prometheus_high_cardinality_cpu%255B0%255D%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520159_prometheus_high_cardinality_cpu%255B0%255D%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 45.2s / πŸ’° $0.23 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522159_prometheus_high_cardinality_cpu%255B0%255D%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520159_prometheus_high_cardinality_cpu%255B0%255D%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 50.3s / πŸ’° $0.25 | +| [159_prometheus_high_cardinality_cpu[1]](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/159_prometheus_high_cardinality_cpu[1]/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522159_prometheus_high_cardinality_cpu%255B1%255D%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520159_prometheus_high_cardinality_cpu%255B1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522159_prometheus_high_cardinality_cpu%255B1%255D%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520159_prometheus_high_cardinality_cpu%255B1%255D%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 55.8s / πŸ’° $0.18 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522159_prometheus_high_cardinality_cpu%255B1%255D%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520159_prometheus_high_cardinality_cpu%255B1%255D%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 30.9s / πŸ’° $0.14 | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522159_prometheus_high_cardinality_cpu%255B1%255D%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520159_prometheus_high_cardinality_cpu%255B1%255D%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 196.1s / πŸ’° $0.13 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522159_prometheus_high_cardinality_cpu%255B1%255D%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520159_prometheus_high_cardinality_cpu%255B1%255D%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 48.5s / πŸ’° $0.21 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522159_prometheus_high_cardinality_cpu%255B1%255D%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520159_prometheus_high_cardinality_cpu%255B1%255D%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 39.2s / πŸ’° $0.21 | +| [159_prometheus_high_cardinality_cpu[2]](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/159_prometheus_high_cardinality_cpu[2]/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522159_prometheus_high_cardinality_cpu%255B2%255D%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520159_prometheus_high_cardinality_cpu%255B2%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522159_prometheus_high_cardinality_cpu%255B2%255D%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520159_prometheus_high_cardinality_cpu%255B2%255D%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 24.5s / πŸ’° $0.12 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522159_prometheus_high_cardinality_cpu%255B2%255D%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520159_prometheus_high_cardinality_cpu%255B2%255D%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 25.8s / πŸ’° $0.10 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522159_prometheus_high_cardinality_cpu%255B2%255D%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520159_prometheus_high_cardinality_cpu%255B2%255D%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 151.9s / πŸ’° $0.13 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522159_prometheus_high_cardinality_cpu%255B2%255D%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520159_prometheus_high_cardinality_cpu%255B2%255D%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 41.3s / πŸ’° $0.24 | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522159_prometheus_high_cardinality_cpu%255B2%255D%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520159_prometheus_high_cardinality_cpu%255B2%255D%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 32.3s / πŸ’° $0.12 | +| [15_failed_readiness_probe](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/15_failed_readiness_probe/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252215_failed_readiness_probe%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252015_failed_readiness_probe%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252215_failed_readiness_probe%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252015_failed_readiness_probe%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 38.3s / πŸ’° $0.13 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252215_failed_readiness_probe%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252015_failed_readiness_probe%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 43.4s / πŸ’° $0.21 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252215_failed_readiness_probe%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252015_failed_readiness_probe%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 175.8s / πŸ’° $0.14 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252215_failed_readiness_probe%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252015_failed_readiness_probe%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 59.0s / πŸ’° $0.14 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252215_failed_readiness_probe%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252015_failed_readiness_probe%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 60.3s / πŸ’° $0.15 | +| [16_failed_no_toolset_found](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/16_failed_no_toolset_found/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252216_failed_no_toolset_found%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252016_failed_no_toolset_found%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252216_failed_no_toolset_found%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252016_failed_no_toolset_found%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 28.9s / πŸ’° $0.06 | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252216_failed_no_toolset_found%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252016_failed_no_toolset_found%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 19.3s / πŸ’° $0.03 | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252216_failed_no_toolset_found%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252016_failed_no_toolset_found%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 44.8s / πŸ’° $0.02 | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252216_failed_no_toolset_found%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252016_failed_no_toolset_found%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 21.5s / πŸ’° $0.06 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252216_failed_no_toolset_found%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252016_failed_no_toolset_found%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 20.5s / πŸ’° $0.06 | +| [17_oom_kill](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/17_oom_kill/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252217_oom_kill%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252017_oom_kill%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252217_oom_kill%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252017_oom_kill%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 34.4s / πŸ’° $0.12 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252217_oom_kill%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252017_oom_kill%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 40.6s / πŸ’° $0.11 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252217_oom_kill%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252017_oom_kill%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 180.5s / πŸ’° $0.11 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252217_oom_kill%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252017_oom_kill%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 51.3s / πŸ’° $0.16 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252217_oom_kill%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252017_oom_kill%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 239.6s / πŸ’° $0.18 | +| [19_detect_missing_app_details](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/19_detect_missing_app_details/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252219_detect_missing_app_details%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252019_detect_missing_app_details%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252219_detect_missing_app_details%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252019_detect_missing_app_details%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 44.6s / πŸ’° $0.14 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252219_detect_missing_app_details%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252019_detect_missing_app_details%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 46.9s / πŸ’° $0.12 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252219_detect_missing_app_details%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252019_detect_missing_app_details%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 319.4s / πŸ’° $0.21 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252219_detect_missing_app_details%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252019_detect_missing_app_details%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 89.0s / πŸ’° $0.20 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252219_detect_missing_app_details%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252019_detect_missing_app_details%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 80.3s / πŸ’° $0.13 | +| [20_long_log_file_search](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/20_long_log_file_search/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252220_long_log_file_search%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252020_long_log_file_search%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252220_long_log_file_search%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252020_long_log_file_search%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 36.9s / πŸ’° $0.13 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252220_long_log_file_search%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252020_long_log_file_search%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 30.0s / πŸ’° $0.05 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252220_long_log_file_search%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252020_long_log_file_search%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 80.9s / πŸ’° $0.05 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252220_long_log_file_search%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252020_long_log_file_search%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 42.3s / πŸ’° $0.10 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252220_long_log_file_search%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252020_long_log_file_search%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 72.9s / πŸ’° $0.11 | +| [21_job_fail_curl_no_svc_account](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/21_job_fail_curl_no_svc_account/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252221_job_fail_curl_no_svc_account%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252021_job_fail_curl_no_svc_account%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252221_job_fail_curl_no_svc_account%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252021_job_fail_curl_no_svc_account%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 45.0s / πŸ’° $0.29 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252221_job_fail_curl_no_svc_account%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252021_job_fail_curl_no_svc_account%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 41.3s / πŸ’° $0.20 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252221_job_fail_curl_no_svc_account%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252021_job_fail_curl_no_svc_account%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 291.9s / πŸ’° $0.20 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252221_job_fail_curl_no_svc_account%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252021_job_fail_curl_no_svc_account%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 48.4s / πŸ’° $0.17 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252221_job_fail_curl_no_svc_account%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252021_job_fail_curl_no_svc_account%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 49.4s / πŸ’° $0.12 | +| [22_high_latency_dbi_down](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/22_high_latency_dbi_down/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252222_high_latency_dbi_down%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252022_high_latency_dbi_down%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252222_high_latency_dbi_down%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252022_high_latency_dbi_down%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252222_high_latency_dbi_down%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252022_high_latency_dbi_down%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252222_high_latency_dbi_down%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252022_high_latency_dbi_down%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252222_high_latency_dbi_down%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252022_high_latency_dbi_down%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252222_high_latency_dbi_down%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252022_high_latency_dbi_down%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [23_app_error_in_current_logs](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/23_app_error_in_current_logs/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252223_app_error_in_current_logs%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252023_app_error_in_current_logs%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252223_app_error_in_current_logs%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252023_app_error_in_current_logs%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 58.5s / πŸ’° $0.41 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252223_app_error_in_current_logs%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252023_app_error_in_current_logs%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 60.2s / πŸ’° $0.14 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252223_app_error_in_current_logs%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252023_app_error_in_current_logs%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 310.8s / πŸ’° $0.40 | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252223_app_error_in_current_logs%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252023_app_error_in_current_logs%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 40.3s | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252223_app_error_in_current_logs%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252023_app_error_in_current_logs%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 70.2s / πŸ’° $0.23 | +| [24_misconfigured_pvc](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/24_misconfigured_pvc/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252224_misconfigured_pvc%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252024_misconfigured_pvc%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252224_misconfigured_pvc%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252024_misconfigured_pvc%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 54.7s / πŸ’° $0.16 | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252224_misconfigured_pvc%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252024_misconfigured_pvc%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 52.0s / πŸ’° $0.10 | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252224_misconfigured_pvc%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252024_misconfigured_pvc%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 26.0s / πŸ’° $0.02 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252224_misconfigured_pvc%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252024_misconfigured_pvc%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 56.4s / πŸ’° $0.14 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252224_misconfigured_pvc%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252024_misconfigured_pvc%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 60.3s / πŸ’° $0.15 | +| [24a_misconfigured_pvc_basic](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/24a_misconfigured_pvc_basic/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252224a_misconfigured_pvc_basic%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252024a_misconfigured_pvc_basic%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252224a_misconfigured_pvc_basic%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252024a_misconfigured_pvc_basic%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 35.1s / πŸ’° $0.16 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252224a_misconfigured_pvc_basic%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252024a_misconfigured_pvc_basic%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 60.1s / πŸ’° $0.11 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252224a_misconfigured_pvc_basic%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252024a_misconfigured_pvc_basic%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 276.9s / πŸ’° $0.26 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252224a_misconfigured_pvc_basic%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252024a_misconfigured_pvc_basic%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 63.9s / πŸ’° $0.15 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252224a_misconfigured_pvc_basic%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252024a_misconfigured_pvc_basic%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 64.2s / πŸ’° $0.16 | +| [24b_misconfigured_pvc_detailed](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/24b_misconfigured_pvc_detailed/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252224b_misconfigured_pvc_detailed%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252024b_misconfigured_pvc_detailed%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252224b_misconfigured_pvc_detailed%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252024b_misconfigured_pvc_detailed%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 30.2s / πŸ’° $0.10 | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252224b_misconfigured_pvc_detailed%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252024b_misconfigured_pvc_detailed%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 37.7s / πŸ’° $0.10 | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252224b_misconfigured_pvc_detailed%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252024b_misconfigured_pvc_detailed%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 24.1s / πŸ’° $0.02 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252224b_misconfigured_pvc_detailed%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252024b_misconfigured_pvc_detailed%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 61.4s / πŸ’° $0.15 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252224b_misconfigured_pvc_detailed%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252024b_misconfigured_pvc_detailed%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 62.7s / πŸ’° $0.15 | +| [25_misconfigured_ingress_class](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/25_misconfigured_ingress_class/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252225_misconfigured_ingress_class%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252025_misconfigured_ingress_class%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252225_misconfigured_ingress_class%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252025_misconfigured_ingress_class%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 85.5s / πŸ’° $0.43 | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252225_misconfigured_ingress_class%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252025_misconfigured_ingress_class%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 47.7s / πŸ’° $0.09 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252225_misconfigured_ingress_class%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252025_misconfigured_ingress_class%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 466.9s / πŸ’° $0.26 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252225_misconfigured_ingress_class%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252025_misconfigured_ingress_class%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 110.3s / πŸ’° $0.37 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252225_misconfigured_ingress_class%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252025_misconfigured_ingress_class%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 76.8s / πŸ’° $0.26 | +| [26_page_render_times](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/26_page_render_times/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252226_page_render_times%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252026_page_render_times%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252226_page_render_times%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252026_page_render_times%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 29.5s / πŸ’° $0.14 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252226_page_render_times%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252026_page_render_times%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 48.7s / πŸ’° $0.10 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252226_page_render_times%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252026_page_render_times%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 479.6s / πŸ’° $0.31 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252226_page_render_times%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252026_page_render_times%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 55.3s / πŸ’° $0.15 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252226_page_render_times%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252026_page_render_times%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 51.4s / πŸ’° $0.17 | +| [27a_multi_container_logs](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/27a_multi_container_logs/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252227a_multi_container_logs%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252027a_multi_container_logs%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252227a_multi_container_logs%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252027a_multi_container_logs%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 28.8s / πŸ’° $0.13 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252227a_multi_container_logs%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252027a_multi_container_logs%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 32.0s / πŸ’° $0.08 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252227a_multi_container_logs%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252027a_multi_container_logs%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 95.2s / πŸ’° $0.08 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252227a_multi_container_logs%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252027a_multi_container_logs%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 40.5s / πŸ’° $0.13 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252227a_multi_container_logs%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252027a_multi_container_logs%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 38.8s / πŸ’° $0.12 | +| [27b_multi_container_logs](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/27b_multi_container_logs/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252227b_multi_container_logs%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252027b_multi_container_logs%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252227b_multi_container_logs%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252027b_multi_container_logs%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 32.0s / πŸ’° $0.12 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252227b_multi_container_logs%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252027b_multi_container_logs%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 37.1s / πŸ’° $0.09 | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252227b_multi_container_logs%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252027b_multi_container_logs%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 57.0s / πŸ’° $0.03 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252227b_multi_container_logs%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252027b_multi_container_logs%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 33.9s / πŸ’° $0.11 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252227b_multi_container_logs%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252027b_multi_container_logs%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 34.7s / πŸ’° $0.11 | +| [28_permissions_error](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/28_permissions_error/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252228_permissions_error%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252028_permissions_error%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252228_permissions_error%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252028_permissions_error%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 19.1s / πŸ’° $0.08 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252228_permissions_error%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252028_permissions_error%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 21.6s / πŸ’° $0.05 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252228_permissions_error%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252028_permissions_error%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 150.7s / πŸ’° $0.10 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252228_permissions_error%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252028_permissions_error%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 18.6s / πŸ’° $0.06 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252228_permissions_error%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252028_permissions_error%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 21.6s / πŸ’° $0.06 | +| [33_cpu_metrics_discovery](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/33_cpu_metrics_discovery/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252233_cpu_metrics_discovery%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252033_cpu_metrics_discovery%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252233_cpu_metrics_discovery%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252033_cpu_metrics_discovery%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 42.4s / πŸ’° $0.13 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252233_cpu_metrics_discovery%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252033_cpu_metrics_discovery%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 39.8s / πŸ’° $0.10 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252233_cpu_metrics_discovery%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252033_cpu_metrics_discovery%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 298.6s / πŸ’° $0.20 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252233_cpu_metrics_discovery%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252033_cpu_metrics_discovery%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 46.8s / πŸ’° $0.13 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252233_cpu_metrics_discovery%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252033_cpu_metrics_discovery%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 106.7s / πŸ’° $0.12 | +| [39_failed_toolset](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/39_failed_toolset/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252239_failed_toolset%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252039_failed_toolset%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252239_failed_toolset%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252039_failed_toolset%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 21.2s / πŸ’° $0.03 | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252239_failed_toolset%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252039_failed_toolset%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 32.8s / πŸ’° $0.05 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252239_failed_toolset%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252039_failed_toolset%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 281.6s / πŸ’° $0.20 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252239_failed_toolset%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252039_failed_toolset%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 52.6s / πŸ’° $0.11 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252239_failed_toolset%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252039_failed_toolset%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 58.1s / πŸ’° $0.12 | +| [41_setup_argo](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/41_setup_argo/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252241_setup_argo%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252041_setup_argo%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252241_setup_argo%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252041_setup_argo%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 22.1s / πŸ’° $0.03 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252241_setup_argo%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252041_setup_argo%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 23.2s / πŸ’° $0.05 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252241_setup_argo%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252041_setup_argo%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 174.7s / πŸ’° $0.11 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252241_setup_argo%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252041_setup_argo%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 18.0s / πŸ’° $0.05 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252241_setup_argo%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252041_setup_argo%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 16.9s / πŸ’° $0.05 | +| [42_dns_issues_result_new_tools_no_runbook](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/42_dns_issues_result_new_tools_no_runbook/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252242_dns_issues_result_new_tools_no_runbook%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252042_dns_issues_result_new_tools_no_runbook%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252242_dns_issues_result_new_tools_no_runbook%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252042_dns_issues_result_new_tools_no_runbook%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 35.3s / πŸ’° $0.13 | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252242_dns_issues_result_new_tools_no_runbook%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252042_dns_issues_result_new_tools_no_runbook%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 65.3s / πŸ’° $0.14 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252242_dns_issues_result_new_tools_no_runbook%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252042_dns_issues_result_new_tools_no_runbook%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 348.3s / πŸ’° $0.22 | [⏱️ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252242_dns_issues_result_new_tools_no_runbook%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252042_dns_issues_result_new_tools_no_runbook%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 673.9s | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252242_dns_issues_result_new_tools_no_runbook%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252042_dns_issues_result_new_tools_no_runbook%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 97.5s / πŸ’° $0.25 | +| [42_dns_issues_steps_new_tools](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/42_dns_issues_steps_new_tools/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252242_dns_issues_steps_new_tools%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252042_dns_issues_steps_new_tools%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252242_dns_issues_steps_new_tools%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252042_dns_issues_steps_new_tools%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 49.0s / πŸ’° $0.15 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252242_dns_issues_steps_new_tools%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252042_dns_issues_steps_new_tools%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 74.8s / πŸ’° $0.16 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252242_dns_issues_steps_new_tools%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252042_dns_issues_steps_new_tools%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 346.7s / πŸ’° $0.18 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252242_dns_issues_steps_new_tools%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252042_dns_issues_steps_new_tools%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 93.8s / πŸ’° $0.27 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252242_dns_issues_steps_new_tools%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252042_dns_issues_steps_new_tools%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 92.2s / πŸ’° $0.28 | +| [43_current_datetime_from_prompt](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/43_current_datetime_from_prompt/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252243_current_datetime_from_prompt%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252043_current_datetime_from_prompt%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252243_current_datetime_from_prompt%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252043_current_datetime_from_prompt%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 12.8s / πŸ’° $0.03 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252243_current_datetime_from_prompt%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252043_current_datetime_from_prompt%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 37.0s / πŸ’° $0.04 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252243_current_datetime_from_prompt%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252043_current_datetime_from_prompt%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 135.6s / πŸ’° $0.07 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252243_current_datetime_from_prompt%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252043_current_datetime_from_prompt%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 14.5s / πŸ’° $0.05 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252243_current_datetime_from_prompt%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252043_current_datetime_from_prompt%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 14.1s / πŸ’° $0.05 | +| [43_slack_deployment_logs](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/43_slack_deployment_logs/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252243_slack_deployment_logs%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252043_slack_deployment_logs%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252243_slack_deployment_logs%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252043_slack_deployment_logs%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252243_slack_deployment_logs%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252043_slack_deployment_logs%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252243_slack_deployment_logs%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252043_slack_deployment_logs%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252243_slack_deployment_logs%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252043_slack_deployment_logs%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252243_slack_deployment_logs%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252043_slack_deployment_logs%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [44_slack_statefulset_logs](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/44_slack_statefulset_logs/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252244_slack_statefulset_logs%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252044_slack_statefulset_logs%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252244_slack_statefulset_logs%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252044_slack_statefulset_logs%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252244_slack_statefulset_logs%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252044_slack_statefulset_logs%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252244_slack_statefulset_logs%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252044_slack_statefulset_logs%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252244_slack_statefulset_logs%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252044_slack_statefulset_logs%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252244_slack_statefulset_logs%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252044_slack_statefulset_logs%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [45_fetch_deployment_logs_simple](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/45_fetch_deployment_logs_simple/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252245_fetch_deployment_logs_simple%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252045_fetch_deployment_logs_simple%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252245_fetch_deployment_logs_simple%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252045_fetch_deployment_logs_simple%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 29.0s / πŸ’° $0.10 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252245_fetch_deployment_logs_simple%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252045_fetch_deployment_logs_simple%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 37.7s / πŸ’° $0.07 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252245_fetch_deployment_logs_simple%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252045_fetch_deployment_logs_simple%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 86.2s / πŸ’° $0.08 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252245_fetch_deployment_logs_simple%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252045_fetch_deployment_logs_simple%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 36.2s / πŸ’° $0.09 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252245_fetch_deployment_logs_simple%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252045_fetch_deployment_logs_simple%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 41.6s / πŸ’° $0.10 | +| [48_logs_since_thursday](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/48_logs_since_thursday/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252248_logs_since_thursday%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252048_logs_since_thursday%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252248_logs_since_thursday%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252048_logs_since_thursday%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252248_logs_since_thursday%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252048_logs_since_thursday%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252248_logs_since_thursday%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252048_logs_since_thursday%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252248_logs_since_thursday%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252048_logs_since_thursday%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252248_logs_since_thursday%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252048_logs_since_thursday%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [50_logs_since_specific_date](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/50_logs_since_specific_date/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252250_logs_since_specific_date%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252050_logs_since_specific_date%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252250_logs_since_specific_date%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252050_logs_since_specific_date%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 19.3s / πŸ’° $0.10 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252250_logs_since_specific_date%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252050_logs_since_specific_date%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 21.6s / πŸ’° $0.07 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252250_logs_since_specific_date%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252050_logs_since_specific_date%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 79.0s / πŸ’° $0.07 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252250_logs_since_specific_date%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252050_logs_since_specific_date%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 28.8s / πŸ’° $0.13 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252250_logs_since_specific_date%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252050_logs_since_specific_date%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 26.8s / πŸ’° $0.11 | +| [50a_logs_since_last_specific_month](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/50a_logs_since_last_specific_month/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252250a_logs_since_last_specific_month%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252050a_logs_since_last_specific_month%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252250a_logs_since_last_specific_month%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252050a_logs_since_last_specific_month%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 36.1s / πŸ’° $0.12 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252250a_logs_since_last_specific_month%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252050a_logs_since_last_specific_month%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 28.7s / πŸ’° $0.08 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252250a_logs_since_last_specific_month%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252050a_logs_since_last_specific_month%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 117.3s / πŸ’° $0.08 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252250a_logs_since_last_specific_month%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252050a_logs_since_last_specific_month%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 44.0s / πŸ’° $0.11 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252250a_logs_since_last_specific_month%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252050a_logs_since_last_specific_month%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 35.9s / πŸ’° $0.09 | +| [51_logs_summarize_errors](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/51_logs_summarize_errors/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252251_logs_summarize_errors%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252051_logs_summarize_errors%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252251_logs_summarize_errors%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252051_logs_summarize_errors%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 28.7s / πŸ’° $0.12 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252251_logs_summarize_errors%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252051_logs_summarize_errors%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 25.5s / πŸ’° $0.04 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252251_logs_summarize_errors%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252051_logs_summarize_errors%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 76.1s / πŸ’° $0.08 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252251_logs_summarize_errors%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252051_logs_summarize_errors%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 38.5s / πŸ’° $0.09 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252251_logs_summarize_errors%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252051_logs_summarize_errors%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 39.1s / πŸ’° $0.10 | +| [52_logs_login_issues](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/52_logs_login_issues/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252252_logs_login_issues%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252052_logs_login_issues%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252252_logs_login_issues%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252052_logs_login_issues%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 36.0s / πŸ’° $0.10 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252252_logs_login_issues%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252052_logs_login_issues%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 64.5s / πŸ’° $0.63 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252252_logs_login_issues%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252052_logs_login_issues%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 251.3s / πŸ’° $0.20 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252252_logs_login_issues%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252052_logs_login_issues%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 48.7s / πŸ’° $0.12 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252252_logs_login_issues%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252052_logs_login_issues%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 50.1s / πŸ’° $0.23 | +| [53_logs_find_term](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/53_logs_find_term/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252253_logs_find_term%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252053_logs_find_term%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252253_logs_find_term%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252053_logs_find_term%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 96.2s / πŸ’° $0.19 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252253_logs_find_term%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252053_logs_find_term%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 29.1s / πŸ’° $0.09 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252253_logs_find_term%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252053_logs_find_term%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 70.7s / πŸ’° $0.06 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252253_logs_find_term%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252053_logs_find_term%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 38.9s / πŸ’° $0.13 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252253_logs_find_term%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252053_logs_find_term%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 38.2s / πŸ’° $0.13 | +| [54_not_truncated_when_getting_pods](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/54_not_truncated_when_getting_pods/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252254_not_truncated_when_getting_pods%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252054_not_truncated_when_getting_pods%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252254_not_truncated_when_getting_pods%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252054_not_truncated_when_getting_pods%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 34.7s / πŸ’° $0.16 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252254_not_truncated_when_getting_pods%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252054_not_truncated_when_getting_pods%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 43.4s / πŸ’° $0.10 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252254_not_truncated_when_getting_pods%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252054_not_truncated_when_getting_pods%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 314.9s / πŸ’° $0.25 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252254_not_truncated_when_getting_pods%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252054_not_truncated_when_getting_pods%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 42.1s / πŸ’° $0.10 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252254_not_truncated_when_getting_pods%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252054_not_truncated_when_getting_pods%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 42.0s / πŸ’° $0.11 | +| [55_kafka_runbook](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/55_kafka_runbook/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252255_kafka_runbook%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252055_kafka_runbook%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252255_kafka_runbook%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252055_kafka_runbook%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252255_kafka_runbook%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252055_kafka_runbook%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252255_kafka_runbook%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252055_kafka_runbook%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252255_kafka_runbook%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252055_kafka_runbook%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252255_kafka_runbook%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252055_kafka_runbook%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [57_wrong_namespace](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/57_wrong_namespace/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252257_wrong_namespace%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252057_wrong_namespace%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252257_wrong_namespace%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252057_wrong_namespace%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 28.0s / πŸ’° $0.10 | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252257_wrong_namespace%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252057_wrong_namespace%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 30.1s / πŸ’° $0.05 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252257_wrong_namespace%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252057_wrong_namespace%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 168.4s / πŸ’° $0.12 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252257_wrong_namespace%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252057_wrong_namespace%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 40.5s / πŸ’° $0.09 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252257_wrong_namespace%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252057_wrong_namespace%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 41.3s / πŸ’° $0.09 | +| [59_label_based_counting](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/59_label_based_counting/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252259_label_based_counting%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252059_label_based_counting%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252259_label_based_counting%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252059_label_based_counting%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 24.7s / πŸ’° $0.10 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252259_label_based_counting%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252059_label_based_counting%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 23.9s / πŸ’° $0.05 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252259_label_based_counting%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252059_label_based_counting%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 46.1s / πŸ’° $0.03 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252259_label_based_counting%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252059_label_based_counting%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 26.2s / πŸ’° $0.07 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252259_label_based_counting%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252059_label_based_counting%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 23.7s / πŸ’° $0.07 | +| [60_count_less_than](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/60_count_less_than/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252260_count_less_than%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252060_count_less_than%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252260_count_less_than%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252060_count_less_than%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 28.6s / πŸ’° $0.12 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252260_count_less_than%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252060_count_less_than%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 34.5s / πŸ’° $0.06 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252260_count_less_than%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252060_count_less_than%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 75.9s / πŸ’° $0.04 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252260_count_less_than%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252060_count_less_than%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 23.1s / πŸ’° $0.07 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252260_count_less_than%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252060_count_less_than%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 33.7s / πŸ’° $0.09 | +| [61_exact_match_counting](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/61_exact_match_counting/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252261_exact_match_counting%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252061_exact_match_counting%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252261_exact_match_counting%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252061_exact_match_counting%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 42.7s / πŸ’° $0.08 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252261_exact_match_counting%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252061_exact_match_counting%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 28.1s / πŸ’° $0.05 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252261_exact_match_counting%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252061_exact_match_counting%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 47.9s / πŸ’° $0.03 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252261_exact_match_counting%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252061_exact_match_counting%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 24.0s / πŸ’° $0.07 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252261_exact_match_counting%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252061_exact_match_counting%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 26.2s / πŸ’° $0.07 | +| [62_fetch_error_logs_with_errors](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/62_fetch_error_logs_with_errors/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252262_fetch_error_logs_with_errors%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252062_fetch_error_logs_with_errors%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252262_fetch_error_logs_with_errors%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252062_fetch_error_logs_with_errors%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 27.1s / πŸ’° $0.10 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252262_fetch_error_logs_with_errors%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252062_fetch_error_logs_with_errors%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 39.6s / πŸ’° $0.08 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252262_fetch_error_logs_with_errors%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252062_fetch_error_logs_with_errors%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 99.9s / πŸ’° $0.06 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252262_fetch_error_logs_with_errors%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252062_fetch_error_logs_with_errors%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 29.9s / πŸ’° $0.08 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252262_fetch_error_logs_with_errors%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252062_fetch_error_logs_with_errors%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 114.0s / πŸ’° $0.08 | +| [63_fetch_error_logs_no_errors](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/63_fetch_error_logs_no_errors/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252263_fetch_error_logs_no_errors%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252063_fetch_error_logs_no_errors%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252263_fetch_error_logs_no_errors%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252063_fetch_error_logs_no_errors%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 43.0s / πŸ’° $0.11 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252263_fetch_error_logs_no_errors%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252063_fetch_error_logs_no_errors%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 34.0s / πŸ’° $0.06 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252263_fetch_error_logs_no_errors%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252063_fetch_error_logs_no_errors%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 153.9s / πŸ’° $0.12 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252263_fetch_error_logs_no_errors%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252063_fetch_error_logs_no_errors%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 26.0s / πŸ’° $0.07 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252263_fetch_error_logs_no_errors%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252063_fetch_error_logs_no_errors%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 34.7s / πŸ’° $0.08 | +| [64_keda_vs_hpa_confusion](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/64_keda_vs_hpa_confusion/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252264_keda_vs_hpa_confusion%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252064_keda_vs_hpa_confusion%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252264_keda_vs_hpa_confusion%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252064_keda_vs_hpa_confusion%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 73.4s / πŸ’° $0.57 | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252264_keda_vs_hpa_confusion%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252064_keda_vs_hpa_confusion%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 103.5s / πŸ’° $0.82 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252264_keda_vs_hpa_confusion%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252064_keda_vs_hpa_confusion%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 143.0s / πŸ’° $0.10 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252264_keda_vs_hpa_confusion%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252064_keda_vs_hpa_confusion%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 57.3s / πŸ’° $0.20 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252264_keda_vs_hpa_confusion%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252064_keda_vs_hpa_confusion%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 75.3s / πŸ’° $0.25 | +| [65_health_check_followup](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/65_health_check_followup/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252265_health_check_followup%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252065_health_check_followup%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252265_health_check_followup%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252065_health_check_followup%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 72.7s / πŸ’° $0.25 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252265_health_check_followup%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252065_health_check_followup%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 58.8s / πŸ’° $0.12 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252265_health_check_followup%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252065_health_check_followup%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 146.5s / πŸ’° $0.14 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252265_health_check_followup%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252065_health_check_followup%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 74.4s / πŸ’° $0.26 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252265_health_check_followup%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252065_health_check_followup%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 80.1s / πŸ’° $0.30 | +| [71_connection_pool_starvation](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/71_connection_pool_starvation/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252271_connection_pool_starvation%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252071_connection_pool_starvation%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252271_connection_pool_starvation%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252071_connection_pool_starvation%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 35.8s / πŸ’° $0.18 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252271_connection_pool_starvation%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252071_connection_pool_starvation%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 39.1s / πŸ’° $0.55 | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252271_connection_pool_starvation%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252071_connection_pool_starvation%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 117.4s / πŸ’° $0.11 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252271_connection_pool_starvation%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252071_connection_pool_starvation%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 61.0s / πŸ’° $0.82 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252271_connection_pool_starvation%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252071_connection_pool_starvation%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 60.3s / πŸ’° $0.33 | +| [73a_time_window_anomaly](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/73a_time_window_anomaly/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252273a_time_window_anomaly%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252073a_time_window_anomaly%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252273a_time_window_anomaly%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252073a_time_window_anomaly%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 41.7s / πŸ’° $0.22 | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252273a_time_window_anomaly%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252073a_time_window_anomaly%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 38.5s / πŸ’° $0.56 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252273a_time_window_anomaly%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252073a_time_window_anomaly%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 122.5s / πŸ’° $0.13 | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252273a_time_window_anomaly%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252073a_time_window_anomaly%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 48.8s / πŸ’° $0.72 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252273a_time_window_anomaly%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252073a_time_window_anomaly%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 66.3s / πŸ’° $0.15 | +| [73b_time_window_anomaly](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/73b_time_window_anomaly/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252273b_time_window_anomaly%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252073b_time_window_anomaly%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252273b_time_window_anomaly%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252073b_time_window_anomaly%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 36.3s / πŸ’° $0.19 | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252273b_time_window_anomaly%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252073b_time_window_anomaly%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 28.2s / πŸ’° $0.05 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252273b_time_window_anomaly%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252073b_time_window_anomaly%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 108.9s / πŸ’° $0.08 | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252273b_time_window_anomaly%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252073b_time_window_anomaly%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 55.6s / πŸ’° $0.78 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252273b_time_window_anomaly%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252073b_time_window_anomaly%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 57.1s / πŸ’° $0.27 | +| [76_service_discovery_issue](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/76_service_discovery_issue/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252276_service_discovery_issue%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252076_service_discovery_issue%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252276_service_discovery_issue%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252076_service_discovery_issue%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 48.0s / πŸ’° $0.25 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252276_service_discovery_issue%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252076_service_discovery_issue%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 51.8s / πŸ’° $0.13 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252276_service_discovery_issue%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252076_service_discovery_issue%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 168.2s / πŸ’° $0.12 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252276_service_discovery_issue%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252076_service_discovery_issue%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 67.0s / πŸ’° $0.95 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252276_service_discovery_issue%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252076_service_discovery_issue%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 54.1s / πŸ’° $0.13 | +| [77_liveness_probe_misconfiguration](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/77_liveness_probe_misconfiguration/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252277_liveness_probe_misconfiguration%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252077_liveness_probe_misconfiguration%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252277_liveness_probe_misconfiguration%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252077_liveness_probe_misconfiguration%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 640.0s / πŸ’° $0.18 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252277_liveness_probe_misconfiguration%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252077_liveness_probe_misconfiguration%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 41.2s / πŸ’° $0.08 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252277_liveness_probe_misconfiguration%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252077_liveness_probe_misconfiguration%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 558.4s / πŸ’° $0.17 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252277_liveness_probe_misconfiguration%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252077_liveness_probe_misconfiguration%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 48.0s / πŸ’° $0.13 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252277_liveness_probe_misconfiguration%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252077_liveness_probe_misconfiguration%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 55.2s / πŸ’° $0.14 | +| [78a_missing_cpu_limits](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/78a_missing_cpu_limits/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252278a_missing_cpu_limits%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252078a_missing_cpu_limits%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252278a_missing_cpu_limits%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252078a_missing_cpu_limits%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 33.4s / πŸ’° $0.10 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252278a_missing_cpu_limits%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252078a_missing_cpu_limits%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 68.8s / πŸ’° $0.12 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252278a_missing_cpu_limits%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252078a_missing_cpu_limits%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 119.6s / πŸ’° $0.10 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252278a_missing_cpu_limits%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252078a_missing_cpu_limits%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 45.7s / πŸ’° $0.11 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252278a_missing_cpu_limits%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252078a_missing_cpu_limits%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 55.9s / πŸ’° $0.12 | +| [78b_cpu_quota_exceeded](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/78b_cpu_quota_exceeded/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252278b_cpu_quota_exceeded%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252078b_cpu_quota_exceeded%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252278b_cpu_quota_exceeded%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252078b_cpu_quota_exceeded%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 42.6s / πŸ’° $0.18 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252278b_cpu_quota_exceeded%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252078b_cpu_quota_exceeded%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 46.4s / πŸ’° $0.10 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252278b_cpu_quota_exceeded%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252078b_cpu_quota_exceeded%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 136.9s / πŸ’° $0.11 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252278b_cpu_quota_exceeded%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252078b_cpu_quota_exceeded%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 47.7s / πŸ’° $0.12 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252278b_cpu_quota_exceeded%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252078b_cpu_quota_exceeded%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 66.6s / πŸ’° $0.16 | +| [79_configmap_mount_issue](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/79_configmap_mount_issue/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252279_configmap_mount_issue%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252079_configmap_mount_issue%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252279_configmap_mount_issue%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252079_configmap_mount_issue%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 32.2s / πŸ’° $0.12 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252279_configmap_mount_issue%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252079_configmap_mount_issue%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 29.6s / πŸ’° $0.08 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252279_configmap_mount_issue%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252079_configmap_mount_issue%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 79.0s / πŸ’° $0.08 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252279_configmap_mount_issue%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252079_configmap_mount_issue%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 55.7s / πŸ’° $0.13 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252279_configmap_mount_issue%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252079_configmap_mount_issue%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 54.7s / πŸ’° $0.13 | +| [80_pvc_storage_class_mismatch](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/80_pvc_storage_class_mismatch/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252280_pvc_storage_class_mismatch%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252080_pvc_storage_class_mismatch%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252280_pvc_storage_class_mismatch%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252080_pvc_storage_class_mismatch%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 39.5s / πŸ’° $0.10 | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252280_pvc_storage_class_mismatch%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252080_pvc_storage_class_mismatch%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 44.9s / πŸ’° $0.08 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252280_pvc_storage_class_mismatch%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252080_pvc_storage_class_mismatch%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 128.6s / πŸ’° $0.09 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252280_pvc_storage_class_mismatch%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252080_pvc_storage_class_mismatch%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 44.2s / πŸ’° $0.11 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252280_pvc_storage_class_mismatch%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252080_pvc_storage_class_mismatch%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 59.5s / πŸ’° $0.15 | +| [81_service_account_permission_denied](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/81_service_account_permission_denied/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252281_service_account_permission_denied%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252081_service_account_permission_denied%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252281_service_account_permission_denied%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252081_service_account_permission_denied%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 43.3s / πŸ’° $0.27 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252281_service_account_permission_denied%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252081_service_account_permission_denied%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 65.7s / πŸ’° $0.77 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252281_service_account_permission_denied%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252081_service_account_permission_denied%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 147.4s / πŸ’° $0.13 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252281_service_account_permission_denied%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252081_service_account_permission_denied%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 54.5s / πŸ’° $0.18 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252281_service_account_permission_denied%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252081_service_account_permission_denied%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 67.6s / πŸ’° $0.29 | +| [82_pod_anti_affinity_conflict](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/82_pod_anti_affinity_conflict/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252282_pod_anti_affinity_conflict%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252082_pod_anti_affinity_conflict%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252282_pod_anti_affinity_conflict%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252082_pod_anti_affinity_conflict%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 34.3s / πŸ’° $0.12 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252282_pod_anti_affinity_conflict%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252082_pod_anti_affinity_conflict%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 44.5s / πŸ’° $0.08 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252282_pod_anti_affinity_conflict%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252082_pod_anti_affinity_conflict%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 331.2s / πŸ’° $0.20 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252282_pod_anti_affinity_conflict%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252082_pod_anti_affinity_conflict%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 55.7s / πŸ’° $0.15 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252282_pod_anti_affinity_conflict%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252082_pod_anti_affinity_conflict%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 69.4s / πŸ’° $0.16 | +| [83_secret_not_found](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/83_secret_not_found/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252283_secret_not_found%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252083_secret_not_found%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252283_secret_not_found%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252083_secret_not_found%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 30.9s / πŸ’° $0.13 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252283_secret_not_found%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252083_secret_not_found%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 34.6s / πŸ’° $0.08 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252283_secret_not_found%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252083_secret_not_found%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 108.2s / πŸ’° $0.09 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252283_secret_not_found%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252083_secret_not_found%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 44.0s / πŸ’° $0.11 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252283_secret_not_found%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252083_secret_not_found%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 50.6s / πŸ’° $0.12 | +| [84_network_policy_blocking_traffic](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/84_network_policy_blocking_traffic/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252284_network_policy_blocking_traffic%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252084_network_policy_blocking_traffic%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252284_network_policy_blocking_traffic%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252084_network_policy_blocking_traffic%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 42.8s / πŸ’° $0.24 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252284_network_policy_blocking_traffic%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252084_network_policy_blocking_traffic%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 71.2s / πŸ’° $0.43 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252284_network_policy_blocking_traffic%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252084_network_policy_blocking_traffic%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 182.7s / πŸ’° $0.20 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252284_network_policy_blocking_traffic%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252084_network_policy_blocking_traffic%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 71.5s / πŸ’° $0.20 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252284_network_policy_blocking_traffic%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252084_network_policy_blocking_traffic%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 75.4s / πŸ’° $0.22 | +| [85_hpa_not_scaling](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/85_hpa_not_scaling/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252285_hpa_not_scaling%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252085_hpa_not_scaling%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252285_hpa_not_scaling%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252085_hpa_not_scaling%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 33.3s / πŸ’° $0.11 | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252285_hpa_not_scaling%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252085_hpa_not_scaling%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 61.4s / πŸ’° $0.12 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252285_hpa_not_scaling%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252085_hpa_not_scaling%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 168.9s / πŸ’° $0.14 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252285_hpa_not_scaling%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252085_hpa_not_scaling%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 52.0s / πŸ’° $0.17 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252285_hpa_not_scaling%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252085_hpa_not_scaling%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 62.4s / πŸ’° $0.23 | +| [86_configmap_like_but_secret](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/86_configmap_like_but_secret/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252286_configmap_like_but_secret%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252086_configmap_like_but_secret%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252286_configmap_like_but_secret%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252086_configmap_like_but_secret%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 30.5s / πŸ’° $0.10 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252286_configmap_like_but_secret%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252086_configmap_like_but_secret%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 43.7s / πŸ’° $0.14 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252286_configmap_like_but_secret%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252086_configmap_like_but_secret%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 566.6s / πŸ’° $0.18 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252286_configmap_like_but_secret%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252086_configmap_like_but_secret%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 56.7s / πŸ’° $0.12 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252286_configmap_like_but_secret%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252086_configmap_like_but_secret%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 64.3s / πŸ’° $0.15 | +| [89_runbook_missing_cloudwatch](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/89_runbook_missing_cloudwatch/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252289_runbook_missing_cloudwatch%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252089_runbook_missing_cloudwatch%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252289_runbook_missing_cloudwatch%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252089_runbook_missing_cloudwatch%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 26.7s / πŸ’° $0.08 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252289_runbook_missing_cloudwatch%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252089_runbook_missing_cloudwatch%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 17.6s / πŸ’° $0.04 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252289_runbook_missing_cloudwatch%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252089_runbook_missing_cloudwatch%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 241.4s / πŸ’° $0.16 | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252289_runbook_missing_cloudwatch%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252089_runbook_missing_cloudwatch%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 99.2s / πŸ’° $0.29 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252289_runbook_missing_cloudwatch%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252089_runbook_missing_cloudwatch%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 31.3s / πŸ’° $0.08 | +| [90_runbook_basic_selection](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/90_runbook_basic_selection/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252290_runbook_basic_selection%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252090_runbook_basic_selection%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252290_runbook_basic_selection%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252090_runbook_basic_selection%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 47.9s / πŸ’° $0.18 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252290_runbook_basic_selection%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252090_runbook_basic_selection%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 133.3s / πŸ’° $0.14 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252290_runbook_basic_selection%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252090_runbook_basic_selection%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 386.5s / πŸ’° $0.21 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252290_runbook_basic_selection%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252090_runbook_basic_selection%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 103.6s / πŸ’° $0.31 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252290_runbook_basic_selection%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252090_runbook_basic_selection%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 111.4s / πŸ’° $0.35 | +| [91f_datadog_logs_historical_pod](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/91f_datadog_logs_historical_pod/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252291f_datadog_logs_historical_pod%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252091f_datadog_logs_historical_pod%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252291f_datadog_logs_historical_pod%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252091f_datadog_logs_historical_pod%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 8.0s | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252291f_datadog_logs_historical_pod%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252091f_datadog_logs_historical_pod%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 7.7s | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252291f_datadog_logs_historical_pod%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252091f_datadog_logs_historical_pod%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 9.0s | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252291f_datadog_logs_historical_pod%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252091f_datadog_logs_historical_pod%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 7.8s | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252291f_datadog_logs_historical_pod%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252091f_datadog_logs_historical_pod%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 8.5s | +| [93_calling_datadog[0]](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/93_calling_datadog[0]/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252293_calling_datadog%255B0%255D%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252093_calling_datadog%255B0%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252293_calling_datadog%255B0%255D%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252093_calling_datadog%255B0%255D%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 73.6s / πŸ’° $0.12 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252293_calling_datadog%255B0%255D%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252093_calling_datadog%255B0%255D%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 10.5s / πŸ’° $0.07 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252293_calling_datadog%255B0%255D%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252093_calling_datadog%255B0%255D%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 50.9s / πŸ’° $0.07 | [⏱️ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252293_calling_datadog%255B0%255D%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252093_calling_datadog%255B0%255D%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 608.3s | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252293_calling_datadog%255B0%255D%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252093_calling_datadog%255B0%255D%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 12.0s / πŸ’° $0.15 | +| [93_calling_datadog[1]](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/93_calling_datadog[1]/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252293_calling_datadog%255B1%255D%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252093_calling_datadog%255B1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252293_calling_datadog%255B1%255D%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252093_calling_datadog%255B1%255D%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 74.2s / πŸ’° $0.12 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252293_calling_datadog%255B1%255D%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252093_calling_datadog%255B1%255D%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 27.4s / πŸ’° $0.11 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252293_calling_datadog%255B1%255D%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252093_calling_datadog%255B1%255D%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 78.9s / πŸ’° $0.12 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252293_calling_datadog%255B1%255D%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252093_calling_datadog%255B1%255D%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 12.1s / πŸ’° $0.15 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252293_calling_datadog%255B1%255D%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252093_calling_datadog%255B1%255D%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 9.6s / πŸ’° $0.15 | +| [93_calling_datadog[2]](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/93_calling_datadog[2]/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252293_calling_datadog%255B2%255D%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252093_calling_datadog%255B2%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252293_calling_datadog%255B2%255D%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252093_calling_datadog%255B2%255D%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 65.2s / πŸ’° $0.12 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252293_calling_datadog%255B2%255D%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252093_calling_datadog%255B2%255D%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 15.1s / πŸ’° $0.11 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252293_calling_datadog%255B2%255D%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252093_calling_datadog%255B2%255D%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 33.1s / πŸ’° $0.06 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252293_calling_datadog%255B2%255D%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252093_calling_datadog%255B2%255D%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 11.4s / πŸ’° $0.15 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252293_calling_datadog%255B2%255D%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252093_calling_datadog%255B2%255D%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 12.1s / πŸ’° $0.15 | +| [93_events_since_specific_date](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/93_events_since_specific_date/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252293_events_since_specific_date%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252093_events_since_specific_date%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252293_events_since_specific_date%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252093_events_since_specific_date%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 13.0s / πŸ’° $0.09 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252293_events_since_specific_date%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252093_events_since_specific_date%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 19.1s / πŸ’° $0.07 | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252293_events_since_specific_date%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252093_events_since_specific_date%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252293_events_since_specific_date%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252093_events_since_specific_date%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 20.1s / πŸ’° $0.11 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252293_events_since_specific_date%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252093_events_since_specific_date%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 14.8s / πŸ’° $0.10 | +| [94_runbook_transparency](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/94_runbook_transparency/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252294_runbook_transparency%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252094_runbook_transparency%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252294_runbook_transparency%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252094_runbook_transparency%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 54.1s / πŸ’° $0.33 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252294_runbook_transparency%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252094_runbook_transparency%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 47.1s / πŸ’° $0.12 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252294_runbook_transparency%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252094_runbook_transparency%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 308.9s / πŸ’° $0.21 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252294_runbook_transparency%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252094_runbook_transparency%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 91.4s / πŸ’° $0.25 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252294_runbook_transparency%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252094_runbook_transparency%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 104.5s / πŸ’° $0.21 | +| [96_no_matching_runbook](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/96_no_matching_runbook/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252296_no_matching_runbook%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252096_no_matching_runbook%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252296_no_matching_runbook%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252096_no_matching_runbook%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 52.1s / πŸ’° $0.23 | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252296_no_matching_runbook%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252096_no_matching_runbook%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 42.3s / πŸ’° $0.11 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252296_no_matching_runbook%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252096_no_matching_runbook%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 275.9s / πŸ’° $0.18 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252296_no_matching_runbook%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252096_no_matching_runbook%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 96.7s / πŸ’° $0.31 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252296_no_matching_runbook%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252096_no_matching_runbook%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 86.8s / πŸ’° $0.36 | +| [97_logs_clarification_needed](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/97_logs_clarification_needed/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252297_logs_clarification_needed%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252097_logs_clarification_needed%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252297_logs_clarification_needed%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252097_logs_clarification_needed%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 11.1s / πŸ’° $0.03 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252297_logs_clarification_needed%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252097_logs_clarification_needed%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 15.9s / πŸ’° $0.03 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252297_logs_clarification_needed%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252097_logs_clarification_needed%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 25.6s / πŸ’° $0.02 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252297_logs_clarification_needed%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252097_logs_clarification_needed%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 37.7s / πŸ’° $0.18 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252297_logs_clarification_needed%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252097_logs_clarification_needed%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 15.9s / πŸ’° $0.05 | +| [98_logs_transparency_default_time](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/98_logs_transparency_default_time/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252298_logs_transparency_default_time%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252098_logs_transparency_default_time%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252298_logs_transparency_default_time%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252098_logs_transparency_default_time%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252298_logs_transparency_default_time%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252098_logs_transparency_default_time%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252298_logs_transparency_default_time%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252098_logs_transparency_default_time%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252298_logs_transparency_default_time%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252098_logs_transparency_default_time%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252298_logs_transparency_default_time%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252098_logs_transparency_default_time%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [99_logs_transparency_custom_time](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/99_logs_transparency_custom_time/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252299_logs_transparency_custom_time%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252099_logs_transparency_custom_time%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252299_logs_transparency_custom_time%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252099_logs_transparency_custom_time%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 36.7s / πŸ’° $0.15 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252299_logs_transparency_custom_time%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252099_logs_transparency_custom_time%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 35.5s / πŸ’° $0.10 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252299_logs_transparency_custom_time%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252099_logs_transparency_custom_time%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 79.7s / πŸ’° $0.07 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252299_logs_transparency_custom_time%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252099_logs_transparency_custom_time%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 42.4s / πŸ’° $0.12 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252299_logs_transparency_custom_time%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252099_logs_transparency_custom_time%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 66.6s / πŸ’° $0.12 | + +--- +*Results are automatically generated and updated weekly. View full traces and detailed analysis in [Braintrust experiment: local-benchmark-20251007-155241](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241).* diff --git a/docs/development/evaluations/latest-results.md b/docs/development/evaluations/latest-results.md index a6317dc3f..11ebb55e8 100644 --- a/docs/development/evaluations/latest-results.md +++ b/docs/development/evaluations/latest-results.md @@ -1,12 +1,9 @@ # HolmesGPT LLM Evaluation Benchmark Results -**Generated**: 2025-09-30 15:37 UTC - -**Total Duration**: 6h 16m 42s - -**Iterations**: 5 - -**Judge (classifier) model**: gpt-4o +**Generated**: 2025-10-07 17:24 UTC +**Total Duration**: 1h 27m 21s +**Iterations**: 1 +**Judge (classifier) model**: gpt-4.1 ## About this Benchmark @@ -18,31 +15,33 @@ If you find scenarios that HolmesGPT does not perform well on, please consider a | Model | Pass | Fail | Skip/Error | Total | Success Rate | |-------|------|------|------------|-------|--------------| -| gpt-4o | 295 | 174 | 56 | 525 | 🟑 63% (295/469) | -| gpt-4.1 | 346 | 122 | 57 | 525 | 🟑 74% (346/468) | -| gpt-5 | 360 | 104 | 61 | 525 | 🟑 78% (360/464) | -| sonnet-4-20250514 | 419 | 51 | 55 | 525 | 🟑 89% (419/470) | -| sonnet-4-5-20250929 | 420 | 50 | 55 | 525 | 🟑 89% (420/470) | +| gpt-4o | 54 | 40 | 11 | 105 | 🟑 57% (54/94) | +| gpt-4.1 | 70 | 24 | 11 | 105 | 🟑 74% (70/94) | +| gpt-5 | 78 | 15 | 12 | 105 | 🟑 84% (78/93) | +| sonnet-4-20250514 | 80 | 14 | 11 | 105 | 🟑 85% (80/94) | +| sonnet-4-5-20250929 | 89 | 5 | 11 | 105 | 🟑 95% (89/94) | ## Model Cost Comparison | Model | Tests | Avg Cost | Min Cost | Max Cost | Total Cost | |-------|-------|----------|----------|----------|------------| -| gpt-4o | 468 | $0.14 | $0.01 | $0.85 | $64.90 | -| gpt-4.1 | 468 | $0.11 | $0.02 | $1.07 | $52.00 | -| gpt-5 | 464 | $0.13 | $0.02 | $0.58 | $61.76 | -| sonnet-4-20250514 | 468 | $0.17 | $0.06 | $1.05 | $80.54 | -| sonnet-4-5-20250929 | 467 | $0.16 | $0.06 | $0.64 | $75.56 | +| gpt-4o | 93 | $0.15 | $0.03 | $0.57 | $13.61 | +| gpt-4.1 | 93 | $0.14 | $0.03 | $0.82 | $13.29 | +| gpt-5 | 92 | $0.13 | $0.02 | $0.40 | $12.42 | +| sonnet-4-20250514 | 89 | $0.19 | $0.05 | $0.95 | $16.93 | +| sonnet-4-5-20250929 | 93 | $0.17 | $0.05 | $0.67 | $16.25 | ## Model Latency Comparison | Model | Avg (s) | Min (s) | Max (s) | P50 (s) | P95 (s) | |-------|---------|---------|---------|---------|---------| -| gpt-4o | 49.0 | 8.9 | 278.2 | 43.5 | 94.7 | -| gpt-4.1 | 53.8 | 5.2 | 236.8 | 48.2 | 109.3 | -| gpt-5 | 190.3 | 22.5 | 1136.0 | 158.1 | 442.5 | -| sonnet-4-20250514 | 89.6 | 10.4 | 879.7 | 64.8 | 231.5 | -| sonnet-4-5-20250929 | 73.0 | 10.6 | 663.3 | 60.0 | 154.6 | +| gpt-4o | 44.0 | 8.0 | 640.0 | 36.0 | 73.6 | +| gpt-4.1 | 39.2 | 5.4 | 133.3 | 37.7 | 71.2 | +| gpt-5 | 190.2 | 9.0 | 881.1 | 150.7 | 466.9 | +| sonnet-4-20250514 | 52.2 | 7.8 | 124.7 | 48.4 | 100.3 | +| sonnet-4-5-20250929 | 59.1 | 8.5 | 208.6 | 55.2 | 117.4 | + +⚠️ **Note:** 4 test(s) excluded from latency calculations due to throttling/timeout errors (sonnet-4-20250514: 3, sonnet-4-5-20250929: 1) ## Performance by Tag @@ -50,28 +49,29 @@ Success rate by test category and model: | Tag | gpt-4o | gpt-4.1 | gpt-5 | sonnet-4-20250514 | sonnet-4-5-20250929 | Warnings | |-----|-------|-------|-------|-------|-------|----------| -| [chain-of-causation](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22tags%2520includes%2520%255B%2522chain-of-causation%2522%255D%22%2C%20%22label%22%3A%20%22Tags%2520includes%2520chain-of-causation%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | πŸ”΄ 0% (0/30) | 🟑 3% (1/30) | 🟑 40% (12/30) | 🟑 63% (19/30) | 🟑 70% (21/30) | ⚠️ 50 skipped | -| [context_window](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22tags%2520includes%2520%255B%2522context_window%2522%255D%22%2C%20%22label%22%3A%20%22Tags%2520includes%2520context_window%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | 🟑 57% (20/35) | 🟑 77% (27/35) | 🟑 83% (29/35) | 🟑 86% (30/35) | 🟑 77% (27/35) | | -| [counting](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22tags%2520includes%2520%255B%2522counting%2522%255D%22%2C%20%22label%22%3A%20%22Tags%2520includes%2520counting%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | 🟒 100% (20/20) | 🟒 100% (20/20) | 🟑 95% (19/20) | 🟒 100% (20/20) | 🟒 100% (20/20) | | -| [database](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22tags%2520includes%2520%255B%2522database%2522%255D%22%2C%20%22label%22%3A%20%22Tags%2520includes%2520database%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | πŸ”΄ 0% (0/5) | 🟑 60% (3/5) | 🟒 100% (5/5) | 🟒 100% (5/5) | 🟒 100% (5/5) | ⚠️ 75 skipped | -| [datadog](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22tags%2520includes%2520%255B%2522datadog%2522%255D%22%2C%20%22label%22%3A%20%22Tags%2520includes%2520datadog%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | 🟑 75% (15/20) | 🟑 80% (16/20) | 🟑 95% (18/19) | 🟒 100% (20/20) | 🟒 100% (20/20) | ⚠️ 1 skipped | -| [datetime](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22tags%2520includes%2520%255B%2522datetime%2522%255D%22%2C%20%22label%22%3A%20%22Tags%2520includes%2520datetime%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | 🟑 65% (13/20) | 🟑 65% (13/20) | 🟑 95% (19/20) | 🟑 75% (15/20) | 🟑 85% (17/20) | ⚠️ 50 skipped | -| [easy](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22tags%2520includes%2520%255B%2522easy%2522%255D%22%2C%20%22label%22%3A%20%22Tags%2520includes%2520easy%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | 🟑 97% (175/180) | 🟑 96% (173/180) | 🟑 80% (144/179) | 🟑 97% (174/180) | 🟑 96% (172/180) | ⚠️ 1 skipped | -| [hard](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22tags%2520includes%2520%255B%2522hard%2522%255D%22%2C%20%22label%22%3A%20%22Tags%2520includes%2520hard%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | 🟑 11% (8/70) | 🟑 29% (20/70) | 🟑 57% (40/70) | 🟑 77% (54/70) | 🟑 80% (56/70) | ⚠️ 150 skipped | -| [kafka](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22tags%2520includes%2520%255B%2522kafka%2522%255D%22%2C%20%22label%22%3A%20%22Tags%2520includes%2520kafka%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | βšͺ️ - | βšͺ️ - | βšͺ️ - | βšͺ️ - | βšͺ️ - | ⚠️ 50 skipped | -| [kubernetes](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22tags%2520includes%2520%255B%2522kubernetes%2522%255D%22%2C%20%22label%22%3A%20%22Tags%2520includes%2520kubernetes%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | 🟑 55% (129/235) | 🟑 71% (168/235) | 🟑 69% (163/235) | 🟑 89% (208/235) | 🟑 87% (205/235) | ⚠️ 25 skipped | -| [logs](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22tags%2520includes%2520%255B%2522logs%2522%255D%22%2C%20%22label%22%3A%20%22Tags%2520includes%2520logs%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | 🟑 62% (80/130) | 🟑 67% (87/129) | 🟑 77% (100/130) | 🟑 75% (98/130) | 🟑 82% (106/130) | ⚠️ 176 skipped | -| [medium](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22tags%2520includes%2520%255B%2522medium%2522%255D%22%2C%20%22label%22%3A%20%22Tags%2520includes%2520medium%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | 🟑 51% (112/219) | 🟑 70% (153/218) | 🟑 82% (176/215) | 🟑 87% (191/220) | 🟑 87% (192/220) | ⚠️ 133 skipped | -| [network](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22tags%2520includes%2520%255B%2522network%2522%255D%22%2C%20%22label%22%3A%20%22Tags%2520includes%2520network%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | 🟑 45% (9/20) | 🟑 60% (12/20) | 🟑 85% (17/20) | 🟒 100% (20/20) | 🟒 100% (20/20) | | -| [numerical](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22tags%2520includes%2520%255B%2522numerical%2522%255D%22%2C%20%22label%22%3A%20%22Tags%2520includes%2520numerical%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | 🟒 100% (5/5) | 🟒 100% (5/5) | 🟒 100% (5/5) | 🟒 100% (5/5) | 🟒 100% (5/5) | | -| [port-forward](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22tags%2520includes%2520%255B%2522port-forward%2522%255D%22%2C%20%22label%22%3A%20%22Tags%2520includes%2520port-forward%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | 🟑 29% (13/45) | 🟑 44% (20/45) | 🟑 53% (24/45) | 🟑 49% (22/45) | 🟑 42% (19/45) | | -| [prometheus](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22tags%2520includes%2520%255B%2522prometheus%2522%255D%22%2C%20%22label%22%3A%20%22Tags%2520includes%2520prometheus%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | 🟑 65% (13/20) | 🟑 95% (19/20) | 🟒 100% (20/20) | 🟒 100% (20/20) | 🟑 80% (16/20) | | -| [question-answer](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22tags%2520includes%2520%255B%2522question-answer%2522%255D%22%2C%20%22label%22%3A%20%22Tags%2520includes%2520question-answer%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | 🟒 100% (20/20) | 🟒 100% (20/20) | 🟑 95% (19/20) | 🟒 100% (20/20) | 🟒 100% (20/20) | | -| [runbooks](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22tags%2520includes%2520%255B%2522runbooks%2522%255D%22%2C%20%22label%22%3A%20%22Tags%2520includes%2520runbooks%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | 🟑 73% (22/30) | 🟑 73% (22/30) | 🟑 93% (28/30) | 🟒 100% (30/30) | 🟑 97% (29/30) | ⚠️ 25 skipped | -| [slackbot](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22tags%2520includes%2520%255B%2522slackbot%2522%255D%22%2C%20%22label%22%3A%20%22Tags%2520includes%2520slackbot%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | βšͺ️ - | βšͺ️ - | βšͺ️ - | βšͺ️ - | βšͺ️ - | ⚠️ 25 skipped | -| [traces](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22tags%2520includes%2520%255B%2522traces%2522%255D%22%2C%20%22label%22%3A%20%22Tags%2520includes%2520traces%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | πŸ”΄ 0% (0/25) | 🟑 4% (1/25) | 🟑 40% (10/25) | 🟑 56% (14/25) | 🟑 64% (16/25) | | -| [transparency](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22tags%2520includes%2520%255B%2522transparency%2522%255D%22%2C%20%22label%22%3A%20%22Tags%2520includes%2520transparency%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | 🟑 71% (50/70) | 🟑 71% (50/70) | 🟑 84% (59/70) | 🟑 81% (57/70) | 🟑 84% (59/70) | ⚠️ 25 skipped | -| **Overall** | 🟑 63% (295/469) | 🟑 74% (346/468) | 🟑 78% (360/464) | 🟑 89% (419/470) | 🟑 89% (420/470) | ⚠️ 284 skipped | +| [chain-of-causation](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22tags%2520includes%2520%255B%2522chain-of-causation%2522%255D%22%2C%20%22label%22%3A%20%22Tags%2520includes%2520chain-of-causation%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | πŸ”΄ 0% (0/6) | πŸ”΄ 0% (0/6) | 🟑 83% (5/6) | 🟑 83% (5/6) | 🟒 100% (6/6) | ⚠️ 10 skipped | +| [context_window](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22tags%2520includes%2520%255B%2522context_window%2522%255D%22%2C%20%22label%22%3A%20%22Tags%2520includes%2520context_window%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | 🟑 29% (2/7) | 🟑 71% (5/7) | 🟑 71% (5/7) | 🟑 71% (5/7) | 🟑 86% (6/7) | | +| [counting](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22tags%2520includes%2520%255B%2522counting%2522%255D%22%2C%20%22label%22%3A%20%22Tags%2520includes%2520counting%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | 🟒 100% (4/4) | 🟒 100% (4/4) | 🟒 100% (4/4) | 🟒 100% (4/4) | 🟒 100% (4/4) | | +| [database](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22tags%2520includes%2520%255B%2522database%2522%255D%22%2C%20%22label%22%3A%20%22Tags%2520includes%2520database%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | πŸ”΄ 0% (0/1) | πŸ”΄ 0% (0/1) | 🟒 100% (1/1) | 🟒 100% (1/1) | 🟒 100% (1/1) | ⚠️ 15 skipped | +| [datadog](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22tags%2520includes%2520%255B%2522datadog%2522%255D%22%2C%20%22label%22%3A%20%22Tags%2520includes%2520datadog%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | 🟑 75% (3/4) | 🟑 75% (3/4) | 🟑 75% (3/4) | 🟑 50% (2/4) | 🟑 75% (3/4) | | +| [datetime](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22tags%2520includes%2520%255B%2522datetime%2522%255D%22%2C%20%22label%22%3A%20%22Tags%2520includes%2520datetime%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | 🟑 50% (2/4) | 🟑 50% (2/4) | 🟒 100% (4/4) | 🟑 50% (2/4) | 🟒 100% (4/4) | ⚠️ 10 skipped | +| [easy](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22tags%2520includes%2520%255B%2522easy%2522%255D%22%2C%20%22label%22%3A%20%22Tags%2520includes%2520easy%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | 🟑 94% (34/36) | 🟑 94% (34/36) | 🟑 89% (32/36) | 🟑 97% (35/36) | 🟒 100% (36/36) | | +| [hard](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22tags%2520includes%2520%255B%2522hard%2522%255D%22%2C%20%22label%22%3A%20%22Tags%2520includes%2520hard%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | 🟑 7% (1/14) | 🟑 21% (3/14) | 🟑 64% (9/14) | 🟑 79% (11/14) | 🟑 86% (12/14) | ⚠️ 30 skipped | +| [kafka](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22tags%2520includes%2520%255B%2522kafka%2522%255D%22%2C%20%22label%22%3A%20%22Tags%2520includes%2520kafka%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | βšͺ️ - | βšͺ️ - | βšͺ️ - | βšͺ️ - | βšͺ️ - | ⚠️ 10 skipped | +| [kubernetes](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22tags%2520includes%2520%255B%2522kubernetes%2522%255D%22%2C%20%22label%22%3A%20%22Tags%2520includes%2520kubernetes%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | 🟑 49% (23/47) | 🟑 74% (35/47) | 🟑 81% (38/47) | 🟑 91% (43/47) | 🟑 94% (44/47) | ⚠️ 5 skipped | +| [logs](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22tags%2520includes%2520%255B%2522logs%2522%255D%22%2C%20%22label%22%3A%20%22Tags%2520includes%2520logs%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | 🟑 54% (14/26) | 🟑 69% (18/26) | 🟑 77% (20/26) | 🟑 69% (18/26) | 🟑 85% (22/26) | ⚠️ 35 skipped | +| [medium](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22tags%2520includes%2520%255B%2522medium%2522%255D%22%2C%20%22label%22%3A%20%22Tags%2520includes%2520medium%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | 🟑 43% (19/44) | 🟑 75% (33/44) | 🟑 86% (37/43) | 🟑 77% (34/44) | 🟑 93% (41/44) | ⚠️ 26 skipped | +| [network](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22tags%2520includes%2520%255B%2522network%2522%255D%22%2C%20%22label%22%3A%20%22Tags%2520includes%2520network%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | 🟑 25% (1/4) | 🟑 50% (2/4) | 🟒 100% (4/4) | 🟑 75% (3/4) | 🟒 100% (4/4) | | +| [no-cicd](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22tags%2520includes%2520%255B%2522no-cicd%2522%255D%22%2C%20%22label%22%3A%20%22Tags%2520includes%2520no-cicd%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | 🟒 100% (1/1) | 🟒 100% (1/1) | 🟒 100% (1/1) | 🟒 100% (1/1) | 🟒 100% (1/1) | | +| [numerical](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22tags%2520includes%2520%255B%2522numerical%2522%255D%22%2C%20%22label%22%3A%20%22Tags%2520includes%2520numerical%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | 🟒 100% (1/1) | 🟒 100% (1/1) | 🟒 100% (1/1) | 🟒 100% (1/1) | 🟒 100% (1/1) | | +| [port-forward](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22tags%2520includes%2520%255B%2522port-forward%2522%255D%22%2C%20%22label%22%3A%20%22Tags%2520includes%2520port-forward%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | 🟑 33% (3/9) | 🟑 56% (5/9) | 🟑 67% (6/9) | 🟑 78% (7/9) | 🟑 67% (6/9) | | +| [prometheus](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22tags%2520includes%2520%255B%2522prometheus%2522%255D%22%2C%20%22label%22%3A%20%22Tags%2520includes%2520prometheus%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | 🟑 50% (2/4) | 🟒 100% (4/4) | 🟑 75% (3/4) | 🟒 100% (4/4) | 🟑 75% (3/4) | | +| [question-answer](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22tags%2520includes%2520%255B%2522question-answer%2522%255D%22%2C%20%22label%22%3A%20%22Tags%2520includes%2520question-answer%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | 🟒 100% (4/4) | 🟒 100% (4/4) | 🟒 100% (4/4) | 🟒 100% (4/4) | 🟒 100% (4/4) | | +| [runbooks](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22tags%2520includes%2520%255B%2522runbooks%2522%255D%22%2C%20%22label%22%3A%20%22Tags%2520includes%2520runbooks%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | 🟑 50% (3/6) | 🟑 67% (4/6) | 🟒 100% (6/6) | 🟑 67% (4/6) | 🟒 100% (6/6) | ⚠️ 5 skipped | +| [slackbot](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22tags%2520includes%2520%255B%2522slackbot%2522%255D%22%2C%20%22label%22%3A%20%22Tags%2520includes%2520slackbot%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | βšͺ️ - | βšͺ️ - | βšͺ️ - | βšͺ️ - | βšͺ️ - | ⚠️ 5 skipped | +| [traces](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22tags%2520includes%2520%255B%2522traces%2522%255D%22%2C%20%22label%22%3A%20%22Tags%2520includes%2520traces%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | πŸ”΄ 0% (0/5) | πŸ”΄ 0% (0/5) | 🟑 80% (4/5) | 🟑 80% (4/5) | 🟒 100% (5/5) | | +| [transparency](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22tags%2520includes%2520%255B%2522transparency%2522%255D%22%2C%20%22label%22%3A%20%22Tags%2520includes%2520transparency%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | 🟑 71% (10/14) | 🟑 79% (11/14) | 🟑 93% (13/14) | 🟑 71% (10/14) | 🟒 100% (14/14) | ⚠️ 5 skipped | +| **Overall** | 🟑 57% (54/94) | 🟑 74% (70/94) | 🟑 84% (78/93) | 🟑 85% (80/94) | 🟑 95% (89/94) | ⚠️ 56 skipped | ## Raw Results @@ -85,224 +85,224 @@ Status of all evaluations across models. Color coding: - ⏱️ Timeout or rate limit error - ⏭️ Test skipped (e.g., known issue or precondition not met) -| Eval ID | [gpt-4o](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.model%2520%253D%2520%2522gpt-4o%2522%22%2C%20%22label%22%3A%20%22metadata.model%2520equals%2520gpt-4o%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [gpt-4.1](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.model%2520%253D%2520%2522gpt-4.1%2522%22%2C%20%22label%22%3A%20%22metadata.model%2520equals%2520gpt-4.1%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [gpt-5](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.model%2520%253D%2520%2522gpt-5%2522%22%2C%20%22label%22%3A%20%22metadata.model%2520equals%2520gpt-5%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [sonnet-4-20250514](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.model%2520%253D%2520%2522anthropic%252Fclaude-sonnet-4-20250514%2522%22%2C%20%22label%22%3A%20%22metadata.model%2520equals%2520anthropic%252Fclaude-sonnet-4-20250514%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [sonnet-4-5-20250929](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.model%2520%253D%2520%2522anthropic%252Fclaude-sonnet-4-5-20250929%2522%22%2C%20%22label%22%3A%20%22metadata.model%2520equals%2520anthropic%252Fclaude-sonnet-4-5-20250929%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| Eval ID | [gpt-4o](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.model%2520%253D%2520%2522gpt-4o%2522%22%2C%20%22label%22%3A%20%22metadata.model%2520equals%2520gpt-4o%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [gpt-4.1](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.model%2520%253D%2520%2522gpt-4.1%2522%22%2C%20%22label%22%3A%20%22metadata.model%2520equals%2520gpt-4.1%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [gpt-5](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.model%2520%253D%2520%2522gpt-5%2522%22%2C%20%22label%22%3A%20%22metadata.model%2520equals%2520gpt-5%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [sonnet-4-20250514](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.model%2520%253D%2520%2522anthropic%252Fclaude-sonnet-4-20250514%2522%22%2C%20%22label%22%3A%20%22metadata.model%2520equals%2520anthropic%252Fclaude-sonnet-4-20250514%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [sonnet-4-5-20250929](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.model%2520%253D%2520%2522anthropic%252Fclaude-sonnet-4-5-20250929%2522%22%2C%20%22label%22%3A%20%22metadata.model%2520equals%2520anthropic%252Fclaude-sonnet-4-5-20250929%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | |---------|-------|-------|-------|-------|-------| -| [**01_how_many_pods**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/01_how_many_pods/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252201_how_many_pods%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252001_how_many_pods%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252201_how_many_pods%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252001_how_many_pods%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252201_how_many_pods%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252001_how_many_pods%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252201_how_many_pods%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252001_how_many_pods%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252201_how_many_pods%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252001_how_many_pods%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252201_how_many_pods%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252001_how_many_pods%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | -| [**02_what_is_wrong_with_pod**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/02_what_is_wrong_with_pod/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252202_what_is_wrong_with_pod%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252002_what_is_wrong_with_pod%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252202_what_is_wrong_with_pod%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252002_what_is_wrong_with_pod%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252202_what_is_wrong_with_pod%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252002_what_is_wrong_with_pod%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252202_what_is_wrong_with_pod%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252002_what_is_wrong_with_pod%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252202_what_is_wrong_with_pod%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252002_what_is_wrong_with_pod%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252202_what_is_wrong_with_pod%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252002_what_is_wrong_with_pod%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | -| [**03_what_is_the_command_to_port_forward**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/03_what_is_the_command_to_port_forward/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252203_what_is_the_command_to_port_forward%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252003_what_is_the_command_to_port_forward%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252203_what_is_the_command_to_port_forward%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252003_what_is_the_command_to_port_forward%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252203_what_is_the_command_to_port_forward%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252003_what_is_the_command_to_port_forward%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252203_what_is_the_command_to_port_forward%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252003_what_is_the_command_to_port_forward%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252203_what_is_the_command_to_port_forward%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252003_what_is_the_command_to_port_forward%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252203_what_is_the_command_to_port_forward%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252003_what_is_the_command_to_port_forward%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | -| [**04_related_k8s_events**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/04_related_k8s_events/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252204_related_k8s_events%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252004_related_k8s_events%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252204_related_k8s_events%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252004_related_k8s_events%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252204_related_k8s_events%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252004_related_k8s_events%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟑](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252204_related_k8s_events%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252004_related_k8s_events%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252204_related_k8s_events%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252004_related_k8s_events%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252204_related_k8s_events%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252004_related_k8s_events%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | -| [**05_image_version**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/05_image_version/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252205_image_version%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252005_image_version%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252205_image_version%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252005_image_version%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252205_image_version%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252005_image_version%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟑](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252205_image_version%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252005_image_version%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252205_image_version%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252005_image_version%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252205_image_version%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252005_image_version%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | -| [**09_crashpod**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/09_crashpod/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252209_crashpod%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252009_crashpod%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252209_crashpod%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252009_crashpod%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252209_crashpod%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252009_crashpod%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟑](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252209_crashpod%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252009_crashpod%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252209_crashpod%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252009_crashpod%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252209_crashpod%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252009_crashpod%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | -| [**100a_historical_logs**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/100a_historical_logs/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522100a_historical_logs%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520100a_historical_logs%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522100a_historical_logs%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520100a_historical_logs%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522100a_historical_logs%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520100a_historical_logs%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522100a_historical_logs%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520100a_historical_logs%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522100a_historical_logs%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520100a_historical_logs%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522100a_historical_logs%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520100a_historical_logs%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | -| [**100b_historical_logs_nonstandard_label**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/100b_historical_logs_nonstandard_label/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522100b_historical_logs_nonstandard_label%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520100b_historical_logs_nonstandard_label%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522100b_historical_logs_nonstandard_label%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520100b_historical_logs_nonstandard_label%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522100b_historical_logs_nonstandard_label%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520100b_historical_logs_nonstandard_label%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522100b_historical_logs_nonstandard_label%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520100b_historical_logs_nonstandard_label%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522100b_historical_logs_nonstandard_label%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520100b_historical_logs_nonstandard_label%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522100b_historical_logs_nonstandard_label%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520100b_historical_logs_nonstandard_label%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | -| [**101_historical_logs_pod_deleted**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/101_historical_logs_pod_deleted/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522101_historical_logs_pod_deleted%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520101_historical_logs_pod_deleted%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522101_historical_logs_pod_deleted%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520101_historical_logs_pod_deleted%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522101_historical_logs_pod_deleted%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520101_historical_logs_pod_deleted%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522101_historical_logs_pod_deleted%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520101_historical_logs_pod_deleted%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522101_historical_logs_pod_deleted%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520101_historical_logs_pod_deleted%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522101_historical_logs_pod_deleted%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520101_historical_logs_pod_deleted%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | -| [**103_logs_transparency_default_limit**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/103_logs_transparency_default_limit/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522103_logs_transparency_default_limit%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520103_logs_transparency_default_limit%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522103_logs_transparency_default_limit%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520103_logs_transparency_default_limit%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522103_logs_transparency_default_limit%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520103_logs_transparency_default_limit%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522103_logs_transparency_default_limit%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520103_logs_transparency_default_limit%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522103_logs_transparency_default_limit%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520103_logs_transparency_default_limit%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522103_logs_transparency_default_limit%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520103_logs_transparency_default_limit%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | -| [**104a_postgres_root_issue**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/104a_postgres_root_issue/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522104a_postgres_root_issue%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520104a_postgres_root_issue%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522104a_postgres_root_issue%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520104a_postgres_root_issue%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟑](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522104a_postgres_root_issue%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520104a_postgres_root_issue%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522104a_postgres_root_issue%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520104a_postgres_root_issue%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522104a_postgres_root_issue%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520104a_postgres_root_issue%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522104a_postgres_root_issue%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520104a_postgres_root_issue%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | -| [**107_log_filter_http_status_code**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/107_log_filter_http_status_code/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522107_log_filter_http_status_code%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520107_log_filter_http_status_code%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟑](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522107_log_filter_http_status_code%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520107_log_filter_http_status_code%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522107_log_filter_http_status_code%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520107_log_filter_http_status_code%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522107_log_filter_http_status_code%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520107_log_filter_http_status_code%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522107_log_filter_http_status_code%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520107_log_filter_http_status_code%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522107_log_filter_http_status_code%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520107_log_filter_http_status_code%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | -| [**108_logs_nearby_lines**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/108_logs_nearby_lines/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522108_logs_nearby_lines%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520108_logs_nearby_lines%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522108_logs_nearby_lines%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520108_logs_nearby_lines%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522108_logs_nearby_lines%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520108_logs_nearby_lines%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟑](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522108_logs_nearby_lines%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520108_logs_nearby_lines%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟑](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522108_logs_nearby_lines%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520108_logs_nearby_lines%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522108_logs_nearby_lines%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520108_logs_nearby_lines%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | -| [**109_logs_transparency_not_found**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/109_logs_transparency_not_found/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522109_logs_transparency_not_found%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520109_logs_transparency_not_found%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟑](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522109_logs_transparency_not_found%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520109_logs_transparency_not_found%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522109_logs_transparency_not_found%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520109_logs_transparency_not_found%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522109_logs_transparency_not_found%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520109_logs_transparency_not_found%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522109_logs_transparency_not_found%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520109_logs_transparency_not_found%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522109_logs_transparency_not_found%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520109_logs_transparency_not_found%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | -| [**10_image_pull_backoff**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/10_image_pull_backoff/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252210_image_pull_backoff%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252010_image_pull_backoff%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252210_image_pull_backoff%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252010_image_pull_backoff%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252210_image_pull_backoff%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252010_image_pull_backoff%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟑](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252210_image_pull_backoff%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252010_image_pull_backoff%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252210_image_pull_backoff%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252010_image_pull_backoff%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252210_image_pull_backoff%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252010_image_pull_backoff%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | -| [**110_k8s_events_image_pull**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/110_k8s_events_image_pull/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522110_k8s_events_image_pull%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520110_k8s_events_image_pull%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522110_k8s_events_image_pull%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520110_k8s_events_image_pull%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522110_k8s_events_image_pull%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520110_k8s_events_image_pull%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522110_k8s_events_image_pull%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520110_k8s_events_image_pull%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522110_k8s_events_image_pull%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520110_k8s_events_image_pull%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522110_k8s_events_image_pull%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520110_k8s_events_image_pull%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | -| [**111_disabled_datadog_traces**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/111_disabled_datadog_traces/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522111_disabled_datadog_traces%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520111_disabled_datadog_traces%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522111_disabled_datadog_traces%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520111_disabled_datadog_traces%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟑](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522111_disabled_datadog_traces%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520111_disabled_datadog_traces%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟑](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522111_disabled_datadog_traces%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520111_disabled_datadog_traces%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522111_disabled_datadog_traces%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520111_disabled_datadog_traces%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522111_disabled_datadog_traces%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520111_disabled_datadog_traces%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | -| [**111_pod_names_contain_service**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/111_pod_names_contain_service/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522111_pod_names_contain_service%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520111_pod_names_contain_service%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522111_pod_names_contain_service%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520111_pod_names_contain_service%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522111_pod_names_contain_service%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520111_pod_names_contain_service%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟑](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522111_pod_names_contain_service%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520111_pod_names_contain_service%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522111_pod_names_contain_service%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520111_pod_names_contain_service%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522111_pod_names_contain_service%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520111_pod_names_contain_service%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | -| [**112_find_pvcs_by_uuid**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/112_find_pvcs_by_uuid/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522112_find_pvcs_by_uuid%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520112_find_pvcs_by_uuid%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522112_find_pvcs_by_uuid%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520112_find_pvcs_by_uuid%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟑](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522112_find_pvcs_by_uuid%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520112_find_pvcs_by_uuid%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522112_find_pvcs_by_uuid%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520112_find_pvcs_by_uuid%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522112_find_pvcs_by_uuid%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520112_find_pvcs_by_uuid%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522112_find_pvcs_by_uuid%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520112_find_pvcs_by_uuid%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | -| [**114_checkout_latency_tracing_rebuild[0]**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/114_checkout_latency_tracing_rebuild[0]/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522114_checkout_latency_tracing_rebuild%255B0%255D%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520114_checkout_latency_tracing_rebuild%255B0%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522114_checkout_latency_tracing_rebuild%255B0%255D%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520114_checkout_latency_tracing_rebuild%255B0%255D%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟑](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522114_checkout_latency_tracing_rebuild%255B0%255D%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520114_checkout_latency_tracing_rebuild%255B0%255D%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟑](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522114_checkout_latency_tracing_rebuild%255B0%255D%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520114_checkout_latency_tracing_rebuild%255B0%255D%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟑](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522114_checkout_latency_tracing_rebuild%255B0%255D%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520114_checkout_latency_tracing_rebuild%255B0%255D%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟑](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522114_checkout_latency_tracing_rebuild%255B0%255D%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520114_checkout_latency_tracing_rebuild%255B0%255D%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | -| [**115_checkout_errors_tracing[0]**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/115_checkout_errors_tracing[0]/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522115_checkout_errors_tracing%255B0%255D%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520115_checkout_errors_tracing%255B0%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522115_checkout_errors_tracing%255B0%255D%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520115_checkout_errors_tracing%255B0%255D%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522115_checkout_errors_tracing%255B0%255D%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520115_checkout_errors_tracing%255B0%255D%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟑](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522115_checkout_errors_tracing%255B0%255D%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520115_checkout_errors_tracing%255B0%255D%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟑](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522115_checkout_errors_tracing%255B0%255D%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520115_checkout_errors_tracing%255B0%255D%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟑](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522115_checkout_errors_tracing%255B0%255D%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520115_checkout_errors_tracing%255B0%255D%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | -| [**11_init_containers**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/11_init_containers/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252211_init_containers%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252011_init_containers%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252211_init_containers%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252011_init_containers%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252211_init_containers%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252011_init_containers%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟑](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252211_init_containers%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252011_init_containers%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252211_init_containers%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252011_init_containers%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252211_init_containers%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252011_init_containers%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | -| [**121_new_relic_checkout_errors_tracing[0]**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/121_new_relic_checkout_errors_tracing[0]/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522121_new_relic_checkout_errors_tracing%255B0%255D%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520121_new_relic_checkout_errors_tracing%255B0%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522121_new_relic_checkout_errors_tracing%255B0%255D%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520121_new_relic_checkout_errors_tracing%255B0%255D%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522121_new_relic_checkout_errors_tracing%255B0%255D%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520121_new_relic_checkout_errors_tracing%255B0%255D%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟑](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522121_new_relic_checkout_errors_tracing%255B0%255D%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520121_new_relic_checkout_errors_tracing%255B0%255D%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟑](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522121_new_relic_checkout_errors_tracing%255B0%255D%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520121_new_relic_checkout_errors_tracing%255B0%255D%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟑](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522121_new_relic_checkout_errors_tracing%255B0%255D%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520121_new_relic_checkout_errors_tracing%255B0%255D%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | -| [**122_new_relic_checkout_latency_tracing_rebuild[0]**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/122_new_relic_checkout_latency_tracing_rebuild[0]/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522122_new_relic_checkout_latency_tracing_rebuild%255B0%255D%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520122_new_relic_checkout_latency_tracing_rebuild%255B0%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522122_new_relic_checkout_latency_tracing_rebuild%255B0%255D%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520122_new_relic_checkout_latency_tracing_rebuild%255B0%255D%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522122_new_relic_checkout_latency_tracing_rebuild%255B0%255D%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520122_new_relic_checkout_latency_tracing_rebuild%255B0%255D%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟑](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522122_new_relic_checkout_latency_tracing_rebuild%255B0%255D%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520122_new_relic_checkout_latency_tracing_rebuild%255B0%255D%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522122_new_relic_checkout_latency_tracing_rebuild%255B0%255D%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520122_new_relic_checkout_latency_tracing_rebuild%255B0%255D%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522122_new_relic_checkout_latency_tracing_rebuild%255B0%255D%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520122_new_relic_checkout_latency_tracing_rebuild%255B0%255D%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | -| [**123_new_relic_checkout_errors_tracing[0]**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/123_new_relic_checkout_errors_tracing[0]/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522123_new_relic_checkout_errors_tracing%255B0%255D%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520123_new_relic_checkout_errors_tracing%255B0%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522123_new_relic_checkout_errors_tracing%255B0%255D%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520123_new_relic_checkout_errors_tracing%255B0%255D%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522123_new_relic_checkout_errors_tracing%255B0%255D%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520123_new_relic_checkout_errors_tracing%255B0%255D%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟑](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522123_new_relic_checkout_errors_tracing%255B0%255D%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520123_new_relic_checkout_errors_tracing%255B0%255D%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522123_new_relic_checkout_errors_tracing%255B0%255D%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520123_new_relic_checkout_errors_tracing%255B0%255D%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522123_new_relic_checkout_errors_tracing%255B0%255D%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520123_new_relic_checkout_errors_tracing%255B0%255D%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | -| [**12_job_crashing**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/12_job_crashing/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252212_job_crashing%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252012_job_crashing%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟑](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252212_job_crashing%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252012_job_crashing%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252212_job_crashing%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252012_job_crashing%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟑](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252212_job_crashing%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252012_job_crashing%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252212_job_crashing%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252012_job_crashing%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252212_job_crashing%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252012_job_crashing%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | -| [**13a_pending_node_selector_basic**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/13a_pending_node_selector_basic/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252213a_pending_node_selector_basic%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252013a_pending_node_selector_basic%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252213a_pending_node_selector_basic%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252013a_pending_node_selector_basic%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252213a_pending_node_selector_basic%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252013a_pending_node_selector_basic%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟑](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252213a_pending_node_selector_basic%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252013a_pending_node_selector_basic%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252213a_pending_node_selector_basic%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252013a_pending_node_selector_basic%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟑](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252213a_pending_node_selector_basic%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252013a_pending_node_selector_basic%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | -| [**13b_pending_node_selector_detailed**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/13b_pending_node_selector_detailed/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252213b_pending_node_selector_detailed%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252013b_pending_node_selector_detailed%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252213b_pending_node_selector_detailed%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252013b_pending_node_selector_detailed%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟑](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252213b_pending_node_selector_detailed%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252013b_pending_node_selector_detailed%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟑](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252213b_pending_node_selector_detailed%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252013b_pending_node_selector_detailed%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252213b_pending_node_selector_detailed%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252013b_pending_node_selector_detailed%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252213b_pending_node_selector_detailed%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252013b_pending_node_selector_detailed%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | -| [**14_pending_resources**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/14_pending_resources/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252214_pending_resources%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252014_pending_resources%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252214_pending_resources%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252014_pending_resources%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252214_pending_resources%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252014_pending_resources%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟑](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252214_pending_resources%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252014_pending_resources%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252214_pending_resources%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252014_pending_resources%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252214_pending_resources%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252014_pending_resources%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | -| [**159_prometheus_high_cardinality_cpu[0]**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/159_prometheus_high_cardinality_cpu[0]/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522159_prometheus_high_cardinality_cpu%255B0%255D%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520159_prometheus_high_cardinality_cpu%255B0%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522159_prometheus_high_cardinality_cpu%255B0%255D%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520159_prometheus_high_cardinality_cpu%255B0%255D%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522159_prometheus_high_cardinality_cpu%255B0%255D%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520159_prometheus_high_cardinality_cpu%255B0%255D%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522159_prometheus_high_cardinality_cpu%255B0%255D%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520159_prometheus_high_cardinality_cpu%255B0%255D%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522159_prometheus_high_cardinality_cpu%255B0%255D%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520159_prometheus_high_cardinality_cpu%255B0%255D%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522159_prometheus_high_cardinality_cpu%255B0%255D%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520159_prometheus_high_cardinality_cpu%255B0%255D%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | -| [**159_prometheus_high_cardinality_cpu[1]**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/159_prometheus_high_cardinality_cpu[1]/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522159_prometheus_high_cardinality_cpu%255B1%255D%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520159_prometheus_high_cardinality_cpu%255B1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟑](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522159_prometheus_high_cardinality_cpu%255B1%255D%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520159_prometheus_high_cardinality_cpu%255B1%255D%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟑](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522159_prometheus_high_cardinality_cpu%255B1%255D%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520159_prometheus_high_cardinality_cpu%255B1%255D%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522159_prometheus_high_cardinality_cpu%255B1%255D%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520159_prometheus_high_cardinality_cpu%255B1%255D%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522159_prometheus_high_cardinality_cpu%255B1%255D%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520159_prometheus_high_cardinality_cpu%255B1%255D%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522159_prometheus_high_cardinality_cpu%255B1%255D%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520159_prometheus_high_cardinality_cpu%255B1%255D%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | -| [**159_prometheus_high_cardinality_cpu[2]**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/159_prometheus_high_cardinality_cpu[2]/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522159_prometheus_high_cardinality_cpu%255B2%255D%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520159_prometheus_high_cardinality_cpu%255B2%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522159_prometheus_high_cardinality_cpu%255B2%255D%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520159_prometheus_high_cardinality_cpu%255B2%255D%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522159_prometheus_high_cardinality_cpu%255B2%255D%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520159_prometheus_high_cardinality_cpu%255B2%255D%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522159_prometheus_high_cardinality_cpu%255B2%255D%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520159_prometheus_high_cardinality_cpu%255B2%255D%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522159_prometheus_high_cardinality_cpu%255B2%255D%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520159_prometheus_high_cardinality_cpu%255B2%255D%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟑](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522159_prometheus_high_cardinality_cpu%255B2%255D%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520159_prometheus_high_cardinality_cpu%255B2%255D%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | -| [**15_failed_readiness_probe**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/15_failed_readiness_probe/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252215_failed_readiness_probe%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252015_failed_readiness_probe%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252215_failed_readiness_probe%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252015_failed_readiness_probe%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252215_failed_readiness_probe%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252015_failed_readiness_probe%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟑](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252215_failed_readiness_probe%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252015_failed_readiness_probe%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252215_failed_readiness_probe%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252015_failed_readiness_probe%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252215_failed_readiness_probe%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252015_failed_readiness_probe%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | -| [**16_failed_no_toolset_found**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/16_failed_no_toolset_found/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252216_failed_no_toolset_found%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252016_failed_no_toolset_found%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252216_failed_no_toolset_found%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252016_failed_no_toolset_found%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252216_failed_no_toolset_found%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252016_failed_no_toolset_found%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252216_failed_no_toolset_found%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252016_failed_no_toolset_found%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟑](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252216_failed_no_toolset_found%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252016_failed_no_toolset_found%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252216_failed_no_toolset_found%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252016_failed_no_toolset_found%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | -| [**17_oom_kill**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/17_oom_kill/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252217_oom_kill%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252017_oom_kill%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252217_oom_kill%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252017_oom_kill%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252217_oom_kill%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252017_oom_kill%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟑](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252217_oom_kill%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252017_oom_kill%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252217_oom_kill%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252017_oom_kill%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252217_oom_kill%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252017_oom_kill%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | -| [**19_detect_missing_app_details**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/19_detect_missing_app_details/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252219_detect_missing_app_details%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252019_detect_missing_app_details%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252219_detect_missing_app_details%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252019_detect_missing_app_details%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟑](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252219_detect_missing_app_details%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252019_detect_missing_app_details%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252219_detect_missing_app_details%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252019_detect_missing_app_details%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252219_detect_missing_app_details%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252019_detect_missing_app_details%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252219_detect_missing_app_details%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252019_detect_missing_app_details%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | -| [**20_long_log_file_search**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/20_long_log_file_search/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252220_long_log_file_search%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252020_long_log_file_search%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252220_long_log_file_search%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252020_long_log_file_search%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252220_long_log_file_search%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252020_long_log_file_search%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252220_long_log_file_search%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252020_long_log_file_search%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252220_long_log_file_search%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252020_long_log_file_search%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252220_long_log_file_search%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252020_long_log_file_search%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | -| [**21_job_fail_curl_no_svc_account**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/21_job_fail_curl_no_svc_account/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252221_job_fail_curl_no_svc_account%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252021_job_fail_curl_no_svc_account%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟑](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252221_job_fail_curl_no_svc_account%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252021_job_fail_curl_no_svc_account%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252221_job_fail_curl_no_svc_account%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252021_job_fail_curl_no_svc_account%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟑](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252221_job_fail_curl_no_svc_account%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252021_job_fail_curl_no_svc_account%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252221_job_fail_curl_no_svc_account%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252021_job_fail_curl_no_svc_account%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252221_job_fail_curl_no_svc_account%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252021_job_fail_curl_no_svc_account%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | -| [**23_app_error_in_current_logs**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/23_app_error_in_current_logs/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252223_app_error_in_current_logs%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252023_app_error_in_current_logs%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252223_app_error_in_current_logs%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252023_app_error_in_current_logs%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252223_app_error_in_current_logs%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252023_app_error_in_current_logs%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252223_app_error_in_current_logs%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252023_app_error_in_current_logs%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟑](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252223_app_error_in_current_logs%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252023_app_error_in_current_logs%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252223_app_error_in_current_logs%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252023_app_error_in_current_logs%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | -| [**24_misconfigured_pvc**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/24_misconfigured_pvc/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252224_misconfigured_pvc%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252024_misconfigured_pvc%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252224_misconfigured_pvc%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252024_misconfigured_pvc%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252224_misconfigured_pvc%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252024_misconfigured_pvc%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252224_misconfigured_pvc%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252024_misconfigured_pvc%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252224_misconfigured_pvc%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252024_misconfigured_pvc%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252224_misconfigured_pvc%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252024_misconfigured_pvc%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | -| [**24a_misconfigured_pvc_basic**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/24a_misconfigured_pvc_basic/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252224a_misconfigured_pvc_basic%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252024a_misconfigured_pvc_basic%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟑](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252224a_misconfigured_pvc_basic%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252024a_misconfigured_pvc_basic%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252224a_misconfigured_pvc_basic%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252024a_misconfigured_pvc_basic%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252224a_misconfigured_pvc_basic%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252024a_misconfigured_pvc_basic%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252224a_misconfigured_pvc_basic%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252024a_misconfigured_pvc_basic%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252224a_misconfigured_pvc_basic%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252024a_misconfigured_pvc_basic%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | -| [**24b_misconfigured_pvc_detailed**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/24b_misconfigured_pvc_detailed/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252224b_misconfigured_pvc_detailed%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252024b_misconfigured_pvc_detailed%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252224b_misconfigured_pvc_detailed%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252024b_misconfigured_pvc_detailed%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟑](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252224b_misconfigured_pvc_detailed%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252024b_misconfigured_pvc_detailed%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟑](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252224b_misconfigured_pvc_detailed%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252024b_misconfigured_pvc_detailed%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252224b_misconfigured_pvc_detailed%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252024b_misconfigured_pvc_detailed%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252224b_misconfigured_pvc_detailed%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252024b_misconfigured_pvc_detailed%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | -| [**25_misconfigured_ingress_class**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/25_misconfigured_ingress_class/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252225_misconfigured_ingress_class%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252025_misconfigured_ingress_class%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252225_misconfigured_ingress_class%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252025_misconfigured_ingress_class%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252225_misconfigured_ingress_class%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252025_misconfigured_ingress_class%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟑](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252225_misconfigured_ingress_class%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252025_misconfigured_ingress_class%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252225_misconfigured_ingress_class%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252025_misconfigured_ingress_class%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252225_misconfigured_ingress_class%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252025_misconfigured_ingress_class%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | -| [**26_page_render_times**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/26_page_render_times/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252226_page_render_times%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252026_page_render_times%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252226_page_render_times%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252026_page_render_times%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252226_page_render_times%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252026_page_render_times%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252226_page_render_times%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252026_page_render_times%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252226_page_render_times%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252026_page_render_times%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252226_page_render_times%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252026_page_render_times%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | -| [**27a_multi_container_logs**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/27a_multi_container_logs/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252227a_multi_container_logs%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252027a_multi_container_logs%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252227a_multi_container_logs%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252027a_multi_container_logs%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252227a_multi_container_logs%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252027a_multi_container_logs%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252227a_multi_container_logs%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252027a_multi_container_logs%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252227a_multi_container_logs%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252027a_multi_container_logs%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252227a_multi_container_logs%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252027a_multi_container_logs%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | -| [**27b_multi_container_logs**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/27b_multi_container_logs/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252227b_multi_container_logs%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252027b_multi_container_logs%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252227b_multi_container_logs%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252027b_multi_container_logs%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟑](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252227b_multi_container_logs%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252027b_multi_container_logs%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟑](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252227b_multi_container_logs%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252027b_multi_container_logs%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252227b_multi_container_logs%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252027b_multi_container_logs%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252227b_multi_container_logs%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252027b_multi_container_logs%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | -| [**28_permissions_error**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/28_permissions_error/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252228_permissions_error%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252028_permissions_error%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟑](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252228_permissions_error%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252028_permissions_error%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟑](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252228_permissions_error%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252028_permissions_error%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟑](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252228_permissions_error%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252028_permissions_error%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252228_permissions_error%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252028_permissions_error%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252228_permissions_error%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252028_permissions_error%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | -| [**33_cpu_metrics_discovery**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/33_cpu_metrics_discovery/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252233_cpu_metrics_discovery%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252033_cpu_metrics_discovery%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252233_cpu_metrics_discovery%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252033_cpu_metrics_discovery%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252233_cpu_metrics_discovery%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252033_cpu_metrics_discovery%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252233_cpu_metrics_discovery%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252033_cpu_metrics_discovery%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252233_cpu_metrics_discovery%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252033_cpu_metrics_discovery%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252233_cpu_metrics_discovery%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252033_cpu_metrics_discovery%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | -| [**39_failed_toolset**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/39_failed_toolset/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252239_failed_toolset%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252039_failed_toolset%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252239_failed_toolset%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252039_failed_toolset%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟑](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252239_failed_toolset%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252039_failed_toolset%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟑](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252239_failed_toolset%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252039_failed_toolset%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟑](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252239_failed_toolset%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252039_failed_toolset%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252239_failed_toolset%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252039_failed_toolset%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | -| [**41_setup_argo**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/41_setup_argo/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252241_setup_argo%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252041_setup_argo%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟑](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252241_setup_argo%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252041_setup_argo%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252241_setup_argo%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252041_setup_argo%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252241_setup_argo%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252041_setup_argo%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252241_setup_argo%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252041_setup_argo%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252241_setup_argo%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252041_setup_argo%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | -| [**42_dns_issues_result_new_tools_no_runbook**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/42_dns_issues_result_new_tools_no_runbook/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252242_dns_issues_result_new_tools_no_runbook%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252042_dns_issues_result_new_tools_no_runbook%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟑](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252242_dns_issues_result_new_tools_no_runbook%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252042_dns_issues_result_new_tools_no_runbook%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟑](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252242_dns_issues_result_new_tools_no_runbook%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252042_dns_issues_result_new_tools_no_runbook%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252242_dns_issues_result_new_tools_no_runbook%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252042_dns_issues_result_new_tools_no_runbook%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252242_dns_issues_result_new_tools_no_runbook%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252042_dns_issues_result_new_tools_no_runbook%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252242_dns_issues_result_new_tools_no_runbook%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252042_dns_issues_result_new_tools_no_runbook%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | -| [**42_dns_issues_steps_new_tools**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/42_dns_issues_steps_new_tools/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252242_dns_issues_steps_new_tools%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252042_dns_issues_steps_new_tools%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252242_dns_issues_steps_new_tools%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252042_dns_issues_steps_new_tools%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252242_dns_issues_steps_new_tools%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252042_dns_issues_steps_new_tools%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252242_dns_issues_steps_new_tools%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252042_dns_issues_steps_new_tools%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252242_dns_issues_steps_new_tools%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252042_dns_issues_steps_new_tools%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252242_dns_issues_steps_new_tools%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252042_dns_issues_steps_new_tools%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | -| [**43_current_datetime_from_prompt**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/43_current_datetime_from_prompt/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252243_current_datetime_from_prompt%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252043_current_datetime_from_prompt%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252243_current_datetime_from_prompt%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252043_current_datetime_from_prompt%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252243_current_datetime_from_prompt%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252043_current_datetime_from_prompt%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252243_current_datetime_from_prompt%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252043_current_datetime_from_prompt%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252243_current_datetime_from_prompt%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252043_current_datetime_from_prompt%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252243_current_datetime_from_prompt%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252043_current_datetime_from_prompt%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | -| [**45_fetch_deployment_logs_simple**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/45_fetch_deployment_logs_simple/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252245_fetch_deployment_logs_simple%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252045_fetch_deployment_logs_simple%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252245_fetch_deployment_logs_simple%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252045_fetch_deployment_logs_simple%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252245_fetch_deployment_logs_simple%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252045_fetch_deployment_logs_simple%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252245_fetch_deployment_logs_simple%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252045_fetch_deployment_logs_simple%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252245_fetch_deployment_logs_simple%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252045_fetch_deployment_logs_simple%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252245_fetch_deployment_logs_simple%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252045_fetch_deployment_logs_simple%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | -| [**50a_logs_since_last_specific_month**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/50a_logs_since_last_specific_month/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252250a_logs_since_last_specific_month%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252050a_logs_since_last_specific_month%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252250a_logs_since_last_specific_month%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252050a_logs_since_last_specific_month%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252250a_logs_since_last_specific_month%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252050a_logs_since_last_specific_month%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252250a_logs_since_last_specific_month%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252050a_logs_since_last_specific_month%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252250a_logs_since_last_specific_month%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252050a_logs_since_last_specific_month%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252250a_logs_since_last_specific_month%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252050a_logs_since_last_specific_month%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | -| [**51_logs_summarize_errors**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/51_logs_summarize_errors/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252251_logs_summarize_errors%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252051_logs_summarize_errors%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252251_logs_summarize_errors%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252051_logs_summarize_errors%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252251_logs_summarize_errors%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252051_logs_summarize_errors%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252251_logs_summarize_errors%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252051_logs_summarize_errors%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252251_logs_summarize_errors%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252051_logs_summarize_errors%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252251_logs_summarize_errors%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252051_logs_summarize_errors%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | -| [**52_logs_login_issues**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/52_logs_login_issues/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252252_logs_login_issues%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252052_logs_login_issues%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟑](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252252_logs_login_issues%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252052_logs_login_issues%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252252_logs_login_issues%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252052_logs_login_issues%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟑](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252252_logs_login_issues%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252052_logs_login_issues%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252252_logs_login_issues%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252052_logs_login_issues%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252252_logs_login_issues%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252052_logs_login_issues%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | -| [**53_logs_find_term**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/53_logs_find_term/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252253_logs_find_term%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252053_logs_find_term%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252253_logs_find_term%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252053_logs_find_term%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252253_logs_find_term%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252053_logs_find_term%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252253_logs_find_term%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252053_logs_find_term%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252253_logs_find_term%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252053_logs_find_term%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252253_logs_find_term%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252053_logs_find_term%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | -| [**54_not_truncated_when_getting_pods**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/54_not_truncated_when_getting_pods/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252254_not_truncated_when_getting_pods%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252054_not_truncated_when_getting_pods%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252254_not_truncated_when_getting_pods%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252054_not_truncated_when_getting_pods%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252254_not_truncated_when_getting_pods%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252054_not_truncated_when_getting_pods%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟑](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252254_not_truncated_when_getting_pods%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252054_not_truncated_when_getting_pods%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252254_not_truncated_when_getting_pods%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252054_not_truncated_when_getting_pods%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟑](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252254_not_truncated_when_getting_pods%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252054_not_truncated_when_getting_pods%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | -| [**57_wrong_namespace**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/57_wrong_namespace/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252257_wrong_namespace%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252057_wrong_namespace%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252257_wrong_namespace%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252057_wrong_namespace%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252257_wrong_namespace%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252057_wrong_namespace%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252257_wrong_namespace%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252057_wrong_namespace%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟑](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252257_wrong_namespace%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252057_wrong_namespace%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252257_wrong_namespace%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252057_wrong_namespace%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | -| [**59_label_based_counting**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/59_label_based_counting/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252259_label_based_counting%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252059_label_based_counting%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252259_label_based_counting%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252059_label_based_counting%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252259_label_based_counting%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252059_label_based_counting%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252259_label_based_counting%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252059_label_based_counting%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252259_label_based_counting%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252059_label_based_counting%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252259_label_based_counting%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252059_label_based_counting%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | -| [**60_count_less_than**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/60_count_less_than/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252260_count_less_than%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252060_count_less_than%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252260_count_less_than%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252060_count_less_than%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252260_count_less_than%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252060_count_less_than%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟑](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252260_count_less_than%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252060_count_less_than%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252260_count_less_than%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252060_count_less_than%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252260_count_less_than%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252060_count_less_than%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | -| [**61_exact_match_counting**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/61_exact_match_counting/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252261_exact_match_counting%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252061_exact_match_counting%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252261_exact_match_counting%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252061_exact_match_counting%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252261_exact_match_counting%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252061_exact_match_counting%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252261_exact_match_counting%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252061_exact_match_counting%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252261_exact_match_counting%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252061_exact_match_counting%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252261_exact_match_counting%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252061_exact_match_counting%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | -| [**62_fetch_error_logs_with_errors**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/62_fetch_error_logs_with_errors/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252262_fetch_error_logs_with_errors%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252062_fetch_error_logs_with_errors%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252262_fetch_error_logs_with_errors%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252062_fetch_error_logs_with_errors%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252262_fetch_error_logs_with_errors%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252062_fetch_error_logs_with_errors%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟑](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252262_fetch_error_logs_with_errors%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252062_fetch_error_logs_with_errors%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252262_fetch_error_logs_with_errors%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252062_fetch_error_logs_with_errors%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252262_fetch_error_logs_with_errors%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252062_fetch_error_logs_with_errors%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | -| [**63_fetch_error_logs_no_errors**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/63_fetch_error_logs_no_errors/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252263_fetch_error_logs_no_errors%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252063_fetch_error_logs_no_errors%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252263_fetch_error_logs_no_errors%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252063_fetch_error_logs_no_errors%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252263_fetch_error_logs_no_errors%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252063_fetch_error_logs_no_errors%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟑](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252263_fetch_error_logs_no_errors%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252063_fetch_error_logs_no_errors%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252263_fetch_error_logs_no_errors%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252063_fetch_error_logs_no_errors%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟑](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252263_fetch_error_logs_no_errors%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252063_fetch_error_logs_no_errors%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | -| [**64_keda_vs_hpa_confusion**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/64_keda_vs_hpa_confusion/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252264_keda_vs_hpa_confusion%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252064_keda_vs_hpa_confusion%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252264_keda_vs_hpa_confusion%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252064_keda_vs_hpa_confusion%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252264_keda_vs_hpa_confusion%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252064_keda_vs_hpa_confusion%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟑](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252264_keda_vs_hpa_confusion%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252064_keda_vs_hpa_confusion%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252264_keda_vs_hpa_confusion%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252064_keda_vs_hpa_confusion%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252264_keda_vs_hpa_confusion%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252064_keda_vs_hpa_confusion%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | -| [**65_health_check_followup**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/65_health_check_followup/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252265_health_check_followup%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252065_health_check_followup%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252265_health_check_followup%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252065_health_check_followup%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252265_health_check_followup%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252065_health_check_followup%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟑](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252265_health_check_followup%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252065_health_check_followup%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252265_health_check_followup%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252065_health_check_followup%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252265_health_check_followup%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252065_health_check_followup%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | -| [**71_connection_pool_starvation**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/71_connection_pool_starvation/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252271_connection_pool_starvation%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252071_connection_pool_starvation%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟑](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252271_connection_pool_starvation%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252071_connection_pool_starvation%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252271_connection_pool_starvation%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252071_connection_pool_starvation%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟑](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252271_connection_pool_starvation%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252071_connection_pool_starvation%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252271_connection_pool_starvation%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252071_connection_pool_starvation%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252271_connection_pool_starvation%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252071_connection_pool_starvation%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | -| [**73a_time_window_anomaly**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/73a_time_window_anomaly/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252273a_time_window_anomaly%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252073a_time_window_anomaly%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252273a_time_window_anomaly%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252073a_time_window_anomaly%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟑](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252273a_time_window_anomaly%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252073a_time_window_anomaly%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252273a_time_window_anomaly%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252073a_time_window_anomaly%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252273a_time_window_anomaly%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252073a_time_window_anomaly%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟑](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252273a_time_window_anomaly%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252073a_time_window_anomaly%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | -| [**73b_time_window_anomaly**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/73b_time_window_anomaly/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252273b_time_window_anomaly%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252073b_time_window_anomaly%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟑](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252273b_time_window_anomaly%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252073b_time_window_anomaly%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟑](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252273b_time_window_anomaly%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252073b_time_window_anomaly%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟑](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252273b_time_window_anomaly%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252073b_time_window_anomaly%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252273b_time_window_anomaly%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252073b_time_window_anomaly%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252273b_time_window_anomaly%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252073b_time_window_anomaly%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | -| [**76_service_discovery_issue**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/76_service_discovery_issue/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252276_service_discovery_issue%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252076_service_discovery_issue%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252276_service_discovery_issue%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252076_service_discovery_issue%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252276_service_discovery_issue%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252076_service_discovery_issue%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252276_service_discovery_issue%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252076_service_discovery_issue%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252276_service_discovery_issue%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252076_service_discovery_issue%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252276_service_discovery_issue%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252076_service_discovery_issue%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | -| [**77_liveness_probe_misconfiguration**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/77_liveness_probe_misconfiguration/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252277_liveness_probe_misconfiguration%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252077_liveness_probe_misconfiguration%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟑](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252277_liveness_probe_misconfiguration%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252077_liveness_probe_misconfiguration%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252277_liveness_probe_misconfiguration%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252077_liveness_probe_misconfiguration%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252277_liveness_probe_misconfiguration%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252077_liveness_probe_misconfiguration%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252277_liveness_probe_misconfiguration%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252077_liveness_probe_misconfiguration%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252277_liveness_probe_misconfiguration%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252077_liveness_probe_misconfiguration%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | -| [**78a_missing_cpu_limits**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/78a_missing_cpu_limits/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252278a_missing_cpu_limits%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252078a_missing_cpu_limits%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252278a_missing_cpu_limits%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252078a_missing_cpu_limits%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252278a_missing_cpu_limits%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252078a_missing_cpu_limits%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252278a_missing_cpu_limits%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252078a_missing_cpu_limits%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252278a_missing_cpu_limits%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252078a_missing_cpu_limits%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252278a_missing_cpu_limits%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252078a_missing_cpu_limits%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | -| [**78b_cpu_quota_exceeded**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/78b_cpu_quota_exceeded/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252278b_cpu_quota_exceeded%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252078b_cpu_quota_exceeded%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252278b_cpu_quota_exceeded%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252078b_cpu_quota_exceeded%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟑](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252278b_cpu_quota_exceeded%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252078b_cpu_quota_exceeded%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252278b_cpu_quota_exceeded%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252078b_cpu_quota_exceeded%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252278b_cpu_quota_exceeded%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252078b_cpu_quota_exceeded%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252278b_cpu_quota_exceeded%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252078b_cpu_quota_exceeded%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | -| [**79_configmap_mount_issue**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/79_configmap_mount_issue/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252279_configmap_mount_issue%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252079_configmap_mount_issue%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252279_configmap_mount_issue%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252079_configmap_mount_issue%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252279_configmap_mount_issue%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252079_configmap_mount_issue%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252279_configmap_mount_issue%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252079_configmap_mount_issue%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252279_configmap_mount_issue%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252079_configmap_mount_issue%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252279_configmap_mount_issue%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252079_configmap_mount_issue%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | -| [**80_pvc_storage_class_mismatch**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/80_pvc_storage_class_mismatch/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252280_pvc_storage_class_mismatch%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252080_pvc_storage_class_mismatch%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252280_pvc_storage_class_mismatch%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252080_pvc_storage_class_mismatch%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252280_pvc_storage_class_mismatch%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252080_pvc_storage_class_mismatch%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252280_pvc_storage_class_mismatch%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252080_pvc_storage_class_mismatch%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252280_pvc_storage_class_mismatch%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252080_pvc_storage_class_mismatch%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252280_pvc_storage_class_mismatch%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252080_pvc_storage_class_mismatch%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | -| [**81_service_account_permission_denied**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/81_service_account_permission_denied/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252281_service_account_permission_denied%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252081_service_account_permission_denied%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟑](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252281_service_account_permission_denied%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252081_service_account_permission_denied%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟑](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252281_service_account_permission_denied%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252081_service_account_permission_denied%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟑](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252281_service_account_permission_denied%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252081_service_account_permission_denied%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252281_service_account_permission_denied%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252081_service_account_permission_denied%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252281_service_account_permission_denied%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252081_service_account_permission_denied%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | -| [**82_pod_anti_affinity_conflict**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/82_pod_anti_affinity_conflict/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252282_pod_anti_affinity_conflict%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252082_pod_anti_affinity_conflict%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252282_pod_anti_affinity_conflict%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252082_pod_anti_affinity_conflict%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252282_pod_anti_affinity_conflict%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252082_pod_anti_affinity_conflict%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252282_pod_anti_affinity_conflict%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252082_pod_anti_affinity_conflict%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252282_pod_anti_affinity_conflict%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252082_pod_anti_affinity_conflict%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252282_pod_anti_affinity_conflict%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252082_pod_anti_affinity_conflict%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | -| [**83_secret_not_found**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/83_secret_not_found/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252283_secret_not_found%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252083_secret_not_found%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252283_secret_not_found%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252083_secret_not_found%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252283_secret_not_found%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252083_secret_not_found%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252283_secret_not_found%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252083_secret_not_found%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252283_secret_not_found%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252083_secret_not_found%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252283_secret_not_found%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252083_secret_not_found%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | -| [**84_network_policy_blocking_traffic**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/84_network_policy_blocking_traffic/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252284_network_policy_blocking_traffic%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252084_network_policy_blocking_traffic%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟑](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252284_network_policy_blocking_traffic%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252084_network_policy_blocking_traffic%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟑](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252284_network_policy_blocking_traffic%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252084_network_policy_blocking_traffic%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252284_network_policy_blocking_traffic%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252084_network_policy_blocking_traffic%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252284_network_policy_blocking_traffic%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252084_network_policy_blocking_traffic%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252284_network_policy_blocking_traffic%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252084_network_policy_blocking_traffic%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | -| [**85_hpa_not_scaling**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/85_hpa_not_scaling/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252285_hpa_not_scaling%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252085_hpa_not_scaling%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252285_hpa_not_scaling%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252085_hpa_not_scaling%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟑](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252285_hpa_not_scaling%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252085_hpa_not_scaling%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252285_hpa_not_scaling%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252085_hpa_not_scaling%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252285_hpa_not_scaling%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252085_hpa_not_scaling%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252285_hpa_not_scaling%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252085_hpa_not_scaling%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | -| [**86_configmap_like_but_secret**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/86_configmap_like_but_secret/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252286_configmap_like_but_secret%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252086_configmap_like_but_secret%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟑](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252286_configmap_like_but_secret%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252086_configmap_like_but_secret%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252286_configmap_like_but_secret%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252086_configmap_like_but_secret%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252286_configmap_like_but_secret%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252086_configmap_like_but_secret%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252286_configmap_like_but_secret%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252086_configmap_like_but_secret%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252286_configmap_like_but_secret%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252086_configmap_like_but_secret%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | -| [**89_runbook_missing_cloudwatch**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/89_runbook_missing_cloudwatch/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252289_runbook_missing_cloudwatch%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252089_runbook_missing_cloudwatch%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟑](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252289_runbook_missing_cloudwatch%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252089_runbook_missing_cloudwatch%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟑](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252289_runbook_missing_cloudwatch%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252089_runbook_missing_cloudwatch%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252289_runbook_missing_cloudwatch%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252089_runbook_missing_cloudwatch%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252289_runbook_missing_cloudwatch%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252089_runbook_missing_cloudwatch%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252289_runbook_missing_cloudwatch%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252089_runbook_missing_cloudwatch%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | -| [**90_runbook_basic_selection**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/90_runbook_basic_selection/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252290_runbook_basic_selection%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252090_runbook_basic_selection%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252290_runbook_basic_selection%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252090_runbook_basic_selection%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252290_runbook_basic_selection%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252090_runbook_basic_selection%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252290_runbook_basic_selection%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252090_runbook_basic_selection%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252290_runbook_basic_selection%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252090_runbook_basic_selection%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟑](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252290_runbook_basic_selection%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252090_runbook_basic_selection%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | -| [**91f_datadog_logs_historical_pod**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/91f_datadog_logs_historical_pod/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252291f_datadog_logs_historical_pod%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252091f_datadog_logs_historical_pod%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252291f_datadog_logs_historical_pod%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252091f_datadog_logs_historical_pod%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟑](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252291f_datadog_logs_historical_pod%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252091f_datadog_logs_historical_pod%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟑](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252291f_datadog_logs_historical_pod%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252091f_datadog_logs_historical_pod%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252291f_datadog_logs_historical_pod%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252091f_datadog_logs_historical_pod%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252291f_datadog_logs_historical_pod%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252091f_datadog_logs_historical_pod%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | -| [**93_calling_datadog[0]**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/93_calling_datadog[0]/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252293_calling_datadog%255B0%255D%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252093_calling_datadog%255B0%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252293_calling_datadog%255B0%255D%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252093_calling_datadog%255B0%255D%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252293_calling_datadog%255B0%255D%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252093_calling_datadog%255B0%255D%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252293_calling_datadog%255B0%255D%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252093_calling_datadog%255B0%255D%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252293_calling_datadog%255B0%255D%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252093_calling_datadog%255B0%255D%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252293_calling_datadog%255B0%255D%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252093_calling_datadog%255B0%255D%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | -| [**93_calling_datadog[1]**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/93_calling_datadog[1]/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252293_calling_datadog%255B1%255D%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252093_calling_datadog%255B1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252293_calling_datadog%255B1%255D%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252093_calling_datadog%255B1%255D%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252293_calling_datadog%255B1%255D%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252093_calling_datadog%255B1%255D%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252293_calling_datadog%255B1%255D%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252093_calling_datadog%255B1%255D%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252293_calling_datadog%255B1%255D%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252093_calling_datadog%255B1%255D%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252293_calling_datadog%255B1%255D%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252093_calling_datadog%255B1%255D%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | -| [**94_runbook_transparency**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/94_runbook_transparency/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252294_runbook_transparency%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252094_runbook_transparency%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252294_runbook_transparency%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252094_runbook_transparency%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252294_runbook_transparency%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252094_runbook_transparency%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252294_runbook_transparency%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252094_runbook_transparency%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252294_runbook_transparency%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252094_runbook_transparency%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252294_runbook_transparency%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252094_runbook_transparency%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | -| [**96_no_matching_runbook**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/96_no_matching_runbook/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252296_no_matching_runbook%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252096_no_matching_runbook%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252296_no_matching_runbook%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252096_no_matching_runbook%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252296_no_matching_runbook%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252096_no_matching_runbook%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟑](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252296_no_matching_runbook%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252096_no_matching_runbook%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252296_no_matching_runbook%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252096_no_matching_runbook%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252296_no_matching_runbook%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252096_no_matching_runbook%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | -| [**97_logs_clarification_needed**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/97_logs_clarification_needed/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252297_logs_clarification_needed%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252097_logs_clarification_needed%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252297_logs_clarification_needed%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252097_logs_clarification_needed%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252297_logs_clarification_needed%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252097_logs_clarification_needed%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252297_logs_clarification_needed%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252097_logs_clarification_needed%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252297_logs_clarification_needed%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252097_logs_clarification_needed%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252297_logs_clarification_needed%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252097_logs_clarification_needed%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | -| [**99_logs_transparency_custom_time**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/99_logs_transparency_custom_time/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252299_logs_transparency_custom_time%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252099_logs_transparency_custom_time%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252299_logs_transparency_custom_time%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252099_logs_transparency_custom_time%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252299_logs_transparency_custom_time%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252099_logs_transparency_custom_time%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252299_logs_transparency_custom_time%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252099_logs_transparency_custom_time%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252299_logs_transparency_custom_time%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252099_logs_transparency_custom_time%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252299_logs_transparency_custom_time%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252099_logs_transparency_custom_time%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | -| [**50_logs_since_specific_date**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/50_logs_since_specific_date/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252250_logs_since_specific_date%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252050_logs_since_specific_date%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252250_logs_since_specific_date%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252050_logs_since_specific_date%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252250_logs_since_specific_date%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252050_logs_since_specific_date%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252250_logs_since_specific_date%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252050_logs_since_specific_date%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252250_logs_since_specific_date%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252050_logs_since_specific_date%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252250_logs_since_specific_date%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252050_logs_since_specific_date%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | -| [**93_calling_datadog[2]**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/93_calling_datadog[2]/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252293_calling_datadog%255B2%255D%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252093_calling_datadog%255B2%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252293_calling_datadog%255B2%255D%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252093_calling_datadog%255B2%255D%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252293_calling_datadog%255B2%255D%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252093_calling_datadog%255B2%255D%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252293_calling_datadog%255B2%255D%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252093_calling_datadog%255B2%255D%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252293_calling_datadog%255B2%255D%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252093_calling_datadog%255B2%255D%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252293_calling_datadog%255B2%255D%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252093_calling_datadog%255B2%255D%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | -| [**93_events_since_specific_date**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/93_events_since_specific_date/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252293_events_since_specific_date%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252093_events_since_specific_date%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252293_events_since_specific_date%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252093_events_since_specific_date%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252293_events_since_specific_date%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252093_events_since_specific_date%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”§](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252293_events_since_specific_date%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252093_events_since_specific_date%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252293_events_since_specific_date%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252093_events_since_specific_date%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252293_events_since_specific_date%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252093_events_since_specific_date%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | -| [**44_slack_statefulset_logs**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/44_slack_statefulset_logs/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252244_slack_statefulset_logs%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252044_slack_statefulset_logs%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”§](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252244_slack_statefulset_logs%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252044_slack_statefulset_logs%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”§](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252244_slack_statefulset_logs%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252044_slack_statefulset_logs%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”§](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252244_slack_statefulset_logs%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252044_slack_statefulset_logs%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”§](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252244_slack_statefulset_logs%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252044_slack_statefulset_logs%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”§](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252244_slack_statefulset_logs%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252044_slack_statefulset_logs%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | -| [**48_logs_since_thursday**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/48_logs_since_thursday/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252248_logs_since_thursday%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252048_logs_since_thursday%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”§](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252248_logs_since_thursday%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252048_logs_since_thursday%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”§](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252248_logs_since_thursday%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252048_logs_since_thursday%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”§](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252248_logs_since_thursday%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252048_logs_since_thursday%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”§](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252248_logs_since_thursday%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252048_logs_since_thursday%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”§](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252248_logs_since_thursday%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252048_logs_since_thursday%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | -| [**22_high_latency_dbi_down**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/22_high_latency_dbi_down/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252222_high_latency_dbi_down%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252022_high_latency_dbi_down%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [⚠️](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252222_high_latency_dbi_down%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252022_high_latency_dbi_down%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [⚠️](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252222_high_latency_dbi_down%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252022_high_latency_dbi_down%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [⚠️](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252222_high_latency_dbi_down%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252022_high_latency_dbi_down%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [⚠️](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252222_high_latency_dbi_down%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252022_high_latency_dbi_down%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [⚠️](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252222_high_latency_dbi_down%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252022_high_latency_dbi_down%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | -| [**08_sock_shop_frontend**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/08_sock_shop_frontend/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252208_sock_shop_frontend%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252008_sock_shop_frontend%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [⏭️](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252208_sock_shop_frontend%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252008_sock_shop_frontend%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [⏭️](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252208_sock_shop_frontend%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252008_sock_shop_frontend%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [⏭️](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252208_sock_shop_frontend%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252008_sock_shop_frontend%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [⏭️](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252208_sock_shop_frontend%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252008_sock_shop_frontend%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [⏭️](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252208_sock_shop_frontend%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252008_sock_shop_frontend%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | -| [**104b_postgres_missing_index_pgstat**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/104b_postgres_missing_index_pgstat/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522104b_postgres_missing_index_pgstat%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520104b_postgres_missing_index_pgstat%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [⏭️](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522104b_postgres_missing_index_pgstat%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520104b_postgres_missing_index_pgstat%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [⏭️](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522104b_postgres_missing_index_pgstat%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520104b_postgres_missing_index_pgstat%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [⏭️](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522104b_postgres_missing_index_pgstat%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520104b_postgres_missing_index_pgstat%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [⏭️](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522104b_postgres_missing_index_pgstat%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520104b_postgres_missing_index_pgstat%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [⏭️](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522104b_postgres_missing_index_pgstat%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520104b_postgres_missing_index_pgstat%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | -| [**104c_postgres_minimal_missing_index**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/104c_postgres_minimal_missing_index/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522104c_postgres_minimal_missing_index%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520104c_postgres_minimal_missing_index%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [⏭️](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522104c_postgres_minimal_missing_index%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520104c_postgres_minimal_missing_index%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [⏭️](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522104c_postgres_minimal_missing_index%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520104c_postgres_minimal_missing_index%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [⏭️](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522104c_postgres_minimal_missing_index%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520104c_postgres_minimal_missing_index%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [⏭️](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522104c_postgres_minimal_missing_index%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520104c_postgres_minimal_missing_index%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [⏭️](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522104c_postgres_minimal_missing_index%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520104c_postgres_minimal_missing_index%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | -| [**105_redis_wrong_data_structure**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/105_redis_wrong_data_structure/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522105_redis_wrong_data_structure%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520105_redis_wrong_data_structure%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [⏭️](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522105_redis_wrong_data_structure%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520105_redis_wrong_data_structure%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [⏭️](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522105_redis_wrong_data_structure%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520105_redis_wrong_data_structure%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [⏭️](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522105_redis_wrong_data_structure%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520105_redis_wrong_data_structure%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [⏭️](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522105_redis_wrong_data_structure%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520105_redis_wrong_data_structure%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [⏭️](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522105_redis_wrong_data_structure%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520105_redis_wrong_data_structure%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | -| [**156_kafka_opensearch_latency**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/156_kafka_opensearch_latency/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522156_kafka_opensearch_latency%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520156_kafka_opensearch_latency%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [⏭️](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522156_kafka_opensearch_latency%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520156_kafka_opensearch_latency%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [⏭️](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522156_kafka_opensearch_latency%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520156_kafka_opensearch_latency%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [⏭️](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522156_kafka_opensearch_latency%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520156_kafka_opensearch_latency%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [⏭️](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522156_kafka_opensearch_latency%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520156_kafka_opensearch_latency%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [⏭️](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522156_kafka_opensearch_latency%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520156_kafka_opensearch_latency%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | -| [**43_slack_deployment_logs**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/43_slack_deployment_logs/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252243_slack_deployment_logs%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252043_slack_deployment_logs%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [⏭️](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252243_slack_deployment_logs%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252043_slack_deployment_logs%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [⏭️](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252243_slack_deployment_logs%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252043_slack_deployment_logs%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [⏭️](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252243_slack_deployment_logs%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252043_slack_deployment_logs%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [⏭️](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252243_slack_deployment_logs%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252043_slack_deployment_logs%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [⏭️](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252243_slack_deployment_logs%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252043_slack_deployment_logs%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | -| [**55_kafka_runbook**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/55_kafka_runbook/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252255_kafka_runbook%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252055_kafka_runbook%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [⏭️](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252255_kafka_runbook%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252055_kafka_runbook%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [⏭️](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252255_kafka_runbook%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252055_kafka_runbook%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [⏭️](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252255_kafka_runbook%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252055_kafka_runbook%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [⏭️](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252255_kafka_runbook%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252055_kafka_runbook%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [⏭️](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252255_kafka_runbook%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252055_kafka_runbook%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | -| [**98_logs_transparency_default_time**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/98_logs_transparency_default_time/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252298_logs_transparency_default_time%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252098_logs_transparency_default_time%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [⏭️](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252298_logs_transparency_default_time%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252098_logs_transparency_default_time%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [⏭️](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252298_logs_transparency_default_time%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252098_logs_transparency_default_time%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [⏭️](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252298_logs_transparency_default_time%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252098_logs_transparency_default_time%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [⏭️](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252298_logs_transparency_default_time%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252098_logs_transparency_default_time%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [⏭️](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252298_logs_transparency_default_time%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252098_logs_transparency_default_time%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | -| **SUMMARY** | 🟑 63% (295/469) | 🟑 74% (346/468) | 🟑 78% (360/464) | 🟑 89% (419/470) | 🟑 89% (420/470) | +| [**01_how_many_pods**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/01_how_many_pods/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252201_how_many_pods%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252001_how_many_pods%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252201_how_many_pods%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252001_how_many_pods%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252201_how_many_pods%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252001_how_many_pods%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252201_how_many_pods%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252001_how_many_pods%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252201_how_many_pods%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252001_how_many_pods%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252201_how_many_pods%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252001_how_many_pods%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**02_what_is_wrong_with_pod**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/02_what_is_wrong_with_pod/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252202_what_is_wrong_with_pod%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252002_what_is_wrong_with_pod%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252202_what_is_wrong_with_pod%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252002_what_is_wrong_with_pod%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252202_what_is_wrong_with_pod%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252002_what_is_wrong_with_pod%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252202_what_is_wrong_with_pod%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252002_what_is_wrong_with_pod%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252202_what_is_wrong_with_pod%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252002_what_is_wrong_with_pod%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252202_what_is_wrong_with_pod%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252002_what_is_wrong_with_pod%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**03_what_is_the_command_to_port_forward**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/03_what_is_the_command_to_port_forward/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252203_what_is_the_command_to_port_forward%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252003_what_is_the_command_to_port_forward%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252203_what_is_the_command_to_port_forward%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252003_what_is_the_command_to_port_forward%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252203_what_is_the_command_to_port_forward%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252003_what_is_the_command_to_port_forward%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252203_what_is_the_command_to_port_forward%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252003_what_is_the_command_to_port_forward%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252203_what_is_the_command_to_port_forward%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252003_what_is_the_command_to_port_forward%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252203_what_is_the_command_to_port_forward%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252003_what_is_the_command_to_port_forward%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**04_related_k8s_events**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/04_related_k8s_events/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252204_related_k8s_events%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252004_related_k8s_events%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252204_related_k8s_events%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252004_related_k8s_events%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252204_related_k8s_events%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252004_related_k8s_events%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252204_related_k8s_events%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252004_related_k8s_events%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252204_related_k8s_events%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252004_related_k8s_events%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252204_related_k8s_events%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252004_related_k8s_events%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**05_image_version**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/05_image_version/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252205_image_version%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252005_image_version%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252205_image_version%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252005_image_version%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252205_image_version%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252005_image_version%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252205_image_version%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252005_image_version%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252205_image_version%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252005_image_version%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252205_image_version%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252005_image_version%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**08_sock_shop_frontend**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/08_sock_shop_frontend/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252208_sock_shop_frontend%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252008_sock_shop_frontend%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [⏭️](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252208_sock_shop_frontend%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252008_sock_shop_frontend%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [⏭️](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252208_sock_shop_frontend%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252008_sock_shop_frontend%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [⏭️](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252208_sock_shop_frontend%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252008_sock_shop_frontend%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [⏭️](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252208_sock_shop_frontend%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252008_sock_shop_frontend%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [⏭️](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252208_sock_shop_frontend%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252008_sock_shop_frontend%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**09_crashpod**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/09_crashpod/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252209_crashpod%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252009_crashpod%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252209_crashpod%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252009_crashpod%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252209_crashpod%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252009_crashpod%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252209_crashpod%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252009_crashpod%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252209_crashpod%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252009_crashpod%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252209_crashpod%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252009_crashpod%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**100a_historical_logs**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/100a_historical_logs/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522100a_historical_logs%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520100a_historical_logs%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522100a_historical_logs%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520100a_historical_logs%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522100a_historical_logs%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520100a_historical_logs%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522100a_historical_logs%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520100a_historical_logs%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522100a_historical_logs%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520100a_historical_logs%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522100a_historical_logs%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520100a_historical_logs%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**100b_historical_logs_nonstandard_label**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/100b_historical_logs_nonstandard_label/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522100b_historical_logs_nonstandard_label%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520100b_historical_logs_nonstandard_label%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522100b_historical_logs_nonstandard_label%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520100b_historical_logs_nonstandard_label%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522100b_historical_logs_nonstandard_label%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520100b_historical_logs_nonstandard_label%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522100b_historical_logs_nonstandard_label%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520100b_historical_logs_nonstandard_label%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522100b_historical_logs_nonstandard_label%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520100b_historical_logs_nonstandard_label%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522100b_historical_logs_nonstandard_label%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520100b_historical_logs_nonstandard_label%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**101_historical_logs_pod_deleted**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/101_historical_logs_pod_deleted/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522101_historical_logs_pod_deleted%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520101_historical_logs_pod_deleted%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522101_historical_logs_pod_deleted%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520101_historical_logs_pod_deleted%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522101_historical_logs_pod_deleted%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520101_historical_logs_pod_deleted%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522101_historical_logs_pod_deleted%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520101_historical_logs_pod_deleted%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522101_historical_logs_pod_deleted%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520101_historical_logs_pod_deleted%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522101_historical_logs_pod_deleted%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520101_historical_logs_pod_deleted%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**103_logs_transparency_default_limit**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/103_logs_transparency_default_limit/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522103_logs_transparency_default_limit%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520103_logs_transparency_default_limit%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522103_logs_transparency_default_limit%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520103_logs_transparency_default_limit%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522103_logs_transparency_default_limit%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520103_logs_transparency_default_limit%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522103_logs_transparency_default_limit%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520103_logs_transparency_default_limit%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522103_logs_transparency_default_limit%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520103_logs_transparency_default_limit%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522103_logs_transparency_default_limit%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520103_logs_transparency_default_limit%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**104a_postgres_root_issue**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/104a_postgres_root_issue/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522104a_postgres_root_issue%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520104a_postgres_root_issue%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522104a_postgres_root_issue%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520104a_postgres_root_issue%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522104a_postgres_root_issue%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520104a_postgres_root_issue%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522104a_postgres_root_issue%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520104a_postgres_root_issue%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522104a_postgres_root_issue%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520104a_postgres_root_issue%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522104a_postgres_root_issue%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520104a_postgres_root_issue%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**104b_postgres_missing_index_pgstat**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/104b_postgres_missing_index_pgstat/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522104b_postgres_missing_index_pgstat%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520104b_postgres_missing_index_pgstat%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [⏭️](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522104b_postgres_missing_index_pgstat%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520104b_postgres_missing_index_pgstat%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [⏭️](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522104b_postgres_missing_index_pgstat%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520104b_postgres_missing_index_pgstat%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [⏭️](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522104b_postgres_missing_index_pgstat%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520104b_postgres_missing_index_pgstat%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [⏭️](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522104b_postgres_missing_index_pgstat%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520104b_postgres_missing_index_pgstat%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [⏭️](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522104b_postgres_missing_index_pgstat%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520104b_postgres_missing_index_pgstat%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**104c_postgres_minimal_missing_index**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/104c_postgres_minimal_missing_index/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522104c_postgres_minimal_missing_index%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520104c_postgres_minimal_missing_index%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [⏭️](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522104c_postgres_minimal_missing_index%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520104c_postgres_minimal_missing_index%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [⏭️](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522104c_postgres_minimal_missing_index%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520104c_postgres_minimal_missing_index%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [⏭️](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522104c_postgres_minimal_missing_index%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520104c_postgres_minimal_missing_index%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [⏭️](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522104c_postgres_minimal_missing_index%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520104c_postgres_minimal_missing_index%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [⏭️](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522104c_postgres_minimal_missing_index%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520104c_postgres_minimal_missing_index%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**105_redis_wrong_data_structure**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/105_redis_wrong_data_structure/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522105_redis_wrong_data_structure%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520105_redis_wrong_data_structure%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [⏭️](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522105_redis_wrong_data_structure%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520105_redis_wrong_data_structure%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [⏭️](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522105_redis_wrong_data_structure%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520105_redis_wrong_data_structure%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [⏭️](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522105_redis_wrong_data_structure%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520105_redis_wrong_data_structure%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [⏭️](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522105_redis_wrong_data_structure%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520105_redis_wrong_data_structure%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [⏭️](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522105_redis_wrong_data_structure%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520105_redis_wrong_data_structure%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**107_log_filter_http_status_code**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/107_log_filter_http_status_code/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522107_log_filter_http_status_code%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520107_log_filter_http_status_code%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522107_log_filter_http_status_code%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520107_log_filter_http_status_code%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522107_log_filter_http_status_code%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520107_log_filter_http_status_code%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522107_log_filter_http_status_code%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520107_log_filter_http_status_code%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522107_log_filter_http_status_code%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520107_log_filter_http_status_code%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522107_log_filter_http_status_code%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520107_log_filter_http_status_code%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**108_logs_nearby_lines**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/108_logs_nearby_lines/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522108_logs_nearby_lines%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520108_logs_nearby_lines%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522108_logs_nearby_lines%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520108_logs_nearby_lines%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522108_logs_nearby_lines%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520108_logs_nearby_lines%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522108_logs_nearby_lines%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520108_logs_nearby_lines%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522108_logs_nearby_lines%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520108_logs_nearby_lines%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522108_logs_nearby_lines%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520108_logs_nearby_lines%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**109_logs_transparency_not_found**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/109_logs_transparency_not_found/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522109_logs_transparency_not_found%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520109_logs_transparency_not_found%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522109_logs_transparency_not_found%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520109_logs_transparency_not_found%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522109_logs_transparency_not_found%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520109_logs_transparency_not_found%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522109_logs_transparency_not_found%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520109_logs_transparency_not_found%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522109_logs_transparency_not_found%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520109_logs_transparency_not_found%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522109_logs_transparency_not_found%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520109_logs_transparency_not_found%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**10_image_pull_backoff**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/10_image_pull_backoff/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252210_image_pull_backoff%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252010_image_pull_backoff%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252210_image_pull_backoff%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252010_image_pull_backoff%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252210_image_pull_backoff%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252010_image_pull_backoff%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252210_image_pull_backoff%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252010_image_pull_backoff%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252210_image_pull_backoff%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252010_image_pull_backoff%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252210_image_pull_backoff%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252010_image_pull_backoff%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**110_k8s_events_image_pull**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/110_k8s_events_image_pull/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522110_k8s_events_image_pull%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520110_k8s_events_image_pull%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522110_k8s_events_image_pull%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520110_k8s_events_image_pull%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522110_k8s_events_image_pull%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520110_k8s_events_image_pull%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522110_k8s_events_image_pull%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520110_k8s_events_image_pull%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522110_k8s_events_image_pull%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520110_k8s_events_image_pull%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522110_k8s_events_image_pull%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520110_k8s_events_image_pull%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**111_disabled_datadog_traces**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/111_disabled_datadog_traces/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522111_disabled_datadog_traces%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520111_disabled_datadog_traces%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522111_disabled_datadog_traces%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520111_disabled_datadog_traces%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522111_disabled_datadog_traces%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520111_disabled_datadog_traces%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522111_disabled_datadog_traces%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520111_disabled_datadog_traces%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522111_disabled_datadog_traces%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520111_disabled_datadog_traces%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522111_disabled_datadog_traces%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520111_disabled_datadog_traces%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**111_pod_names_contain_service**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/111_pod_names_contain_service/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522111_pod_names_contain_service%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520111_pod_names_contain_service%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522111_pod_names_contain_service%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520111_pod_names_contain_service%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522111_pod_names_contain_service%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520111_pod_names_contain_service%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522111_pod_names_contain_service%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520111_pod_names_contain_service%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522111_pod_names_contain_service%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520111_pod_names_contain_service%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522111_pod_names_contain_service%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520111_pod_names_contain_service%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**112_find_pvcs_by_uuid**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/112_find_pvcs_by_uuid/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522112_find_pvcs_by_uuid%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520112_find_pvcs_by_uuid%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522112_find_pvcs_by_uuid%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520112_find_pvcs_by_uuid%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522112_find_pvcs_by_uuid%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520112_find_pvcs_by_uuid%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522112_find_pvcs_by_uuid%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520112_find_pvcs_by_uuid%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522112_find_pvcs_by_uuid%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520112_find_pvcs_by_uuid%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522112_find_pvcs_by_uuid%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520112_find_pvcs_by_uuid%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**114_checkout_latency_tracing_rebuild[0]**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/114_checkout_latency_tracing_rebuild[0]/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522114_checkout_latency_tracing_rebuild%255B0%255D%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520114_checkout_latency_tracing_rebuild%255B0%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522114_checkout_latency_tracing_rebuild%255B0%255D%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520114_checkout_latency_tracing_rebuild%255B0%255D%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522114_checkout_latency_tracing_rebuild%255B0%255D%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520114_checkout_latency_tracing_rebuild%255B0%255D%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522114_checkout_latency_tracing_rebuild%255B0%255D%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520114_checkout_latency_tracing_rebuild%255B0%255D%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522114_checkout_latency_tracing_rebuild%255B0%255D%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520114_checkout_latency_tracing_rebuild%255B0%255D%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522114_checkout_latency_tracing_rebuild%255B0%255D%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520114_checkout_latency_tracing_rebuild%255B0%255D%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**115_checkout_errors_tracing[0]**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/115_checkout_errors_tracing[0]/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522115_checkout_errors_tracing%255B0%255D%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520115_checkout_errors_tracing%255B0%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522115_checkout_errors_tracing%255B0%255D%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520115_checkout_errors_tracing%255B0%255D%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522115_checkout_errors_tracing%255B0%255D%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520115_checkout_errors_tracing%255B0%255D%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522115_checkout_errors_tracing%255B0%255D%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520115_checkout_errors_tracing%255B0%255D%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522115_checkout_errors_tracing%255B0%255D%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520115_checkout_errors_tracing%255B0%255D%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522115_checkout_errors_tracing%255B0%255D%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520115_checkout_errors_tracing%255B0%255D%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**11_init_containers**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/11_init_containers/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252211_init_containers%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252011_init_containers%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252211_init_containers%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252011_init_containers%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252211_init_containers%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252011_init_containers%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252211_init_containers%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252011_init_containers%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252211_init_containers%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252011_init_containers%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252211_init_containers%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252011_init_containers%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**121_new_relic_checkout_errors_tracing[0]**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/121_new_relic_checkout_errors_tracing[0]/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522121_new_relic_checkout_errors_tracing%255B0%255D%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520121_new_relic_checkout_errors_tracing%255B0%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522121_new_relic_checkout_errors_tracing%255B0%255D%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520121_new_relic_checkout_errors_tracing%255B0%255D%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522121_new_relic_checkout_errors_tracing%255B0%255D%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520121_new_relic_checkout_errors_tracing%255B0%255D%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522121_new_relic_checkout_errors_tracing%255B0%255D%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520121_new_relic_checkout_errors_tracing%255B0%255D%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522121_new_relic_checkout_errors_tracing%255B0%255D%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520121_new_relic_checkout_errors_tracing%255B0%255D%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522121_new_relic_checkout_errors_tracing%255B0%255D%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520121_new_relic_checkout_errors_tracing%255B0%255D%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**122_new_relic_checkout_latency_tracing_rebuild[0]**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/122_new_relic_checkout_latency_tracing_rebuild[0]/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522122_new_relic_checkout_latency_tracing_rebuild%255B0%255D%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520122_new_relic_checkout_latency_tracing_rebuild%255B0%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522122_new_relic_checkout_latency_tracing_rebuild%255B0%255D%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520122_new_relic_checkout_latency_tracing_rebuild%255B0%255D%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522122_new_relic_checkout_latency_tracing_rebuild%255B0%255D%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520122_new_relic_checkout_latency_tracing_rebuild%255B0%255D%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522122_new_relic_checkout_latency_tracing_rebuild%255B0%255D%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520122_new_relic_checkout_latency_tracing_rebuild%255B0%255D%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522122_new_relic_checkout_latency_tracing_rebuild%255B0%255D%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520122_new_relic_checkout_latency_tracing_rebuild%255B0%255D%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522122_new_relic_checkout_latency_tracing_rebuild%255B0%255D%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520122_new_relic_checkout_latency_tracing_rebuild%255B0%255D%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**123_new_relic_checkout_errors_tracing[0]**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/123_new_relic_checkout_errors_tracing[0]/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522123_new_relic_checkout_errors_tracing%255B0%255D%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520123_new_relic_checkout_errors_tracing%255B0%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522123_new_relic_checkout_errors_tracing%255B0%255D%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520123_new_relic_checkout_errors_tracing%255B0%255D%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522123_new_relic_checkout_errors_tracing%255B0%255D%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520123_new_relic_checkout_errors_tracing%255B0%255D%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522123_new_relic_checkout_errors_tracing%255B0%255D%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520123_new_relic_checkout_errors_tracing%255B0%255D%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [⏱️](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522123_new_relic_checkout_errors_tracing%255B0%255D%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520123_new_relic_checkout_errors_tracing%255B0%255D%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522123_new_relic_checkout_errors_tracing%255B0%255D%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520123_new_relic_checkout_errors_tracing%255B0%255D%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**12_job_crashing**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/12_job_crashing/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252212_job_crashing%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252012_job_crashing%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252212_job_crashing%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252012_job_crashing%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252212_job_crashing%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252012_job_crashing%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252212_job_crashing%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252012_job_crashing%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252212_job_crashing%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252012_job_crashing%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252212_job_crashing%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252012_job_crashing%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**13a_pending_node_selector_basic**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/13a_pending_node_selector_basic/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252213a_pending_node_selector_basic%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252013a_pending_node_selector_basic%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252213a_pending_node_selector_basic%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252013a_pending_node_selector_basic%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252213a_pending_node_selector_basic%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252013a_pending_node_selector_basic%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252213a_pending_node_selector_basic%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252013a_pending_node_selector_basic%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252213a_pending_node_selector_basic%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252013a_pending_node_selector_basic%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252213a_pending_node_selector_basic%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252013a_pending_node_selector_basic%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**13b_pending_node_selector_detailed**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/13b_pending_node_selector_detailed/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252213b_pending_node_selector_detailed%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252013b_pending_node_selector_detailed%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252213b_pending_node_selector_detailed%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252013b_pending_node_selector_detailed%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252213b_pending_node_selector_detailed%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252013b_pending_node_selector_detailed%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252213b_pending_node_selector_detailed%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252013b_pending_node_selector_detailed%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252213b_pending_node_selector_detailed%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252013b_pending_node_selector_detailed%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252213b_pending_node_selector_detailed%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252013b_pending_node_selector_detailed%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**14_pending_resources**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/14_pending_resources/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252214_pending_resources%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252014_pending_resources%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252214_pending_resources%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252014_pending_resources%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252214_pending_resources%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252014_pending_resources%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252214_pending_resources%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252014_pending_resources%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252214_pending_resources%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252014_pending_resources%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252214_pending_resources%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252014_pending_resources%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**156_kafka_opensearch_latency**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/156_kafka_opensearch_latency/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522156_kafka_opensearch_latency%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520156_kafka_opensearch_latency%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [⏭️](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522156_kafka_opensearch_latency%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520156_kafka_opensearch_latency%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [⏭️](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522156_kafka_opensearch_latency%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520156_kafka_opensearch_latency%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [⏭️](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522156_kafka_opensearch_latency%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520156_kafka_opensearch_latency%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [⏭️](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522156_kafka_opensearch_latency%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520156_kafka_opensearch_latency%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [⏭️](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522156_kafka_opensearch_latency%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520156_kafka_opensearch_latency%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**159_prometheus_high_cardinality_cpu[0]**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/159_prometheus_high_cardinality_cpu[0]/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522159_prometheus_high_cardinality_cpu%255B0%255D%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520159_prometheus_high_cardinality_cpu%255B0%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522159_prometheus_high_cardinality_cpu%255B0%255D%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520159_prometheus_high_cardinality_cpu%255B0%255D%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522159_prometheus_high_cardinality_cpu%255B0%255D%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520159_prometheus_high_cardinality_cpu%255B0%255D%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522159_prometheus_high_cardinality_cpu%255B0%255D%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520159_prometheus_high_cardinality_cpu%255B0%255D%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522159_prometheus_high_cardinality_cpu%255B0%255D%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520159_prometheus_high_cardinality_cpu%255B0%255D%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522159_prometheus_high_cardinality_cpu%255B0%255D%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520159_prometheus_high_cardinality_cpu%255B0%255D%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**159_prometheus_high_cardinality_cpu[1]**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/159_prometheus_high_cardinality_cpu[1]/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522159_prometheus_high_cardinality_cpu%255B1%255D%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520159_prometheus_high_cardinality_cpu%255B1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522159_prometheus_high_cardinality_cpu%255B1%255D%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520159_prometheus_high_cardinality_cpu%255B1%255D%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522159_prometheus_high_cardinality_cpu%255B1%255D%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520159_prometheus_high_cardinality_cpu%255B1%255D%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522159_prometheus_high_cardinality_cpu%255B1%255D%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520159_prometheus_high_cardinality_cpu%255B1%255D%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522159_prometheus_high_cardinality_cpu%255B1%255D%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520159_prometheus_high_cardinality_cpu%255B1%255D%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522159_prometheus_high_cardinality_cpu%255B1%255D%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520159_prometheus_high_cardinality_cpu%255B1%255D%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**159_prometheus_high_cardinality_cpu[2]**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/159_prometheus_high_cardinality_cpu[2]/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522159_prometheus_high_cardinality_cpu%255B2%255D%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520159_prometheus_high_cardinality_cpu%255B2%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522159_prometheus_high_cardinality_cpu%255B2%255D%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520159_prometheus_high_cardinality_cpu%255B2%255D%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522159_prometheus_high_cardinality_cpu%255B2%255D%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520159_prometheus_high_cardinality_cpu%255B2%255D%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522159_prometheus_high_cardinality_cpu%255B2%255D%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520159_prometheus_high_cardinality_cpu%255B2%255D%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522159_prometheus_high_cardinality_cpu%255B2%255D%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520159_prometheus_high_cardinality_cpu%255B2%255D%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522159_prometheus_high_cardinality_cpu%255B2%255D%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520159_prometheus_high_cardinality_cpu%255B2%255D%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**15_failed_readiness_probe**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/15_failed_readiness_probe/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252215_failed_readiness_probe%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252015_failed_readiness_probe%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252215_failed_readiness_probe%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252015_failed_readiness_probe%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252215_failed_readiness_probe%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252015_failed_readiness_probe%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252215_failed_readiness_probe%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252015_failed_readiness_probe%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252215_failed_readiness_probe%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252015_failed_readiness_probe%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252215_failed_readiness_probe%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252015_failed_readiness_probe%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**16_failed_no_toolset_found**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/16_failed_no_toolset_found/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252216_failed_no_toolset_found%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252016_failed_no_toolset_found%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252216_failed_no_toolset_found%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252016_failed_no_toolset_found%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252216_failed_no_toolset_found%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252016_failed_no_toolset_found%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252216_failed_no_toolset_found%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252016_failed_no_toolset_found%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252216_failed_no_toolset_found%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252016_failed_no_toolset_found%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252216_failed_no_toolset_found%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252016_failed_no_toolset_found%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**17_oom_kill**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/17_oom_kill/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252217_oom_kill%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252017_oom_kill%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252217_oom_kill%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252017_oom_kill%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252217_oom_kill%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252017_oom_kill%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252217_oom_kill%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252017_oom_kill%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252217_oom_kill%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252017_oom_kill%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252217_oom_kill%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252017_oom_kill%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**19_detect_missing_app_details**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/19_detect_missing_app_details/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252219_detect_missing_app_details%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252019_detect_missing_app_details%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252219_detect_missing_app_details%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252019_detect_missing_app_details%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252219_detect_missing_app_details%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252019_detect_missing_app_details%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252219_detect_missing_app_details%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252019_detect_missing_app_details%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252219_detect_missing_app_details%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252019_detect_missing_app_details%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252219_detect_missing_app_details%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252019_detect_missing_app_details%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**20_long_log_file_search**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/20_long_log_file_search/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252220_long_log_file_search%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252020_long_log_file_search%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252220_long_log_file_search%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252020_long_log_file_search%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252220_long_log_file_search%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252020_long_log_file_search%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252220_long_log_file_search%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252020_long_log_file_search%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252220_long_log_file_search%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252020_long_log_file_search%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252220_long_log_file_search%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252020_long_log_file_search%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**21_job_fail_curl_no_svc_account**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/21_job_fail_curl_no_svc_account/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252221_job_fail_curl_no_svc_account%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252021_job_fail_curl_no_svc_account%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252221_job_fail_curl_no_svc_account%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252021_job_fail_curl_no_svc_account%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252221_job_fail_curl_no_svc_account%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252021_job_fail_curl_no_svc_account%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252221_job_fail_curl_no_svc_account%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252021_job_fail_curl_no_svc_account%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252221_job_fail_curl_no_svc_account%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252021_job_fail_curl_no_svc_account%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252221_job_fail_curl_no_svc_account%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252021_job_fail_curl_no_svc_account%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**22_high_latency_dbi_down**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/22_high_latency_dbi_down/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252222_high_latency_dbi_down%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252022_high_latency_dbi_down%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [⚠️](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252222_high_latency_dbi_down%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252022_high_latency_dbi_down%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [⚠️](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252222_high_latency_dbi_down%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252022_high_latency_dbi_down%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [⚠️](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252222_high_latency_dbi_down%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252022_high_latency_dbi_down%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [⚠️](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252222_high_latency_dbi_down%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252022_high_latency_dbi_down%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [⚠️](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252222_high_latency_dbi_down%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252022_high_latency_dbi_down%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**23_app_error_in_current_logs**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/23_app_error_in_current_logs/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252223_app_error_in_current_logs%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252023_app_error_in_current_logs%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252223_app_error_in_current_logs%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252023_app_error_in_current_logs%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252223_app_error_in_current_logs%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252023_app_error_in_current_logs%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252223_app_error_in_current_logs%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252023_app_error_in_current_logs%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252223_app_error_in_current_logs%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252023_app_error_in_current_logs%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252223_app_error_in_current_logs%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252023_app_error_in_current_logs%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**24_misconfigured_pvc**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/24_misconfigured_pvc/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252224_misconfigured_pvc%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252024_misconfigured_pvc%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252224_misconfigured_pvc%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252024_misconfigured_pvc%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252224_misconfigured_pvc%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252024_misconfigured_pvc%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252224_misconfigured_pvc%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252024_misconfigured_pvc%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252224_misconfigured_pvc%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252024_misconfigured_pvc%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252224_misconfigured_pvc%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252024_misconfigured_pvc%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**24a_misconfigured_pvc_basic**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/24a_misconfigured_pvc_basic/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252224a_misconfigured_pvc_basic%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252024a_misconfigured_pvc_basic%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252224a_misconfigured_pvc_basic%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252024a_misconfigured_pvc_basic%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252224a_misconfigured_pvc_basic%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252024a_misconfigured_pvc_basic%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252224a_misconfigured_pvc_basic%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252024a_misconfigured_pvc_basic%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252224a_misconfigured_pvc_basic%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252024a_misconfigured_pvc_basic%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252224a_misconfigured_pvc_basic%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252024a_misconfigured_pvc_basic%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**24b_misconfigured_pvc_detailed**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/24b_misconfigured_pvc_detailed/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252224b_misconfigured_pvc_detailed%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252024b_misconfigured_pvc_detailed%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252224b_misconfigured_pvc_detailed%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252024b_misconfigured_pvc_detailed%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252224b_misconfigured_pvc_detailed%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252024b_misconfigured_pvc_detailed%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252224b_misconfigured_pvc_detailed%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252024b_misconfigured_pvc_detailed%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252224b_misconfigured_pvc_detailed%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252024b_misconfigured_pvc_detailed%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252224b_misconfigured_pvc_detailed%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252024b_misconfigured_pvc_detailed%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**25_misconfigured_ingress_class**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/25_misconfigured_ingress_class/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252225_misconfigured_ingress_class%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252025_misconfigured_ingress_class%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252225_misconfigured_ingress_class%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252025_misconfigured_ingress_class%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252225_misconfigured_ingress_class%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252025_misconfigured_ingress_class%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252225_misconfigured_ingress_class%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252025_misconfigured_ingress_class%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252225_misconfigured_ingress_class%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252025_misconfigured_ingress_class%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252225_misconfigured_ingress_class%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252025_misconfigured_ingress_class%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**26_page_render_times**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/26_page_render_times/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252226_page_render_times%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252026_page_render_times%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252226_page_render_times%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252026_page_render_times%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252226_page_render_times%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252026_page_render_times%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252226_page_render_times%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252026_page_render_times%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252226_page_render_times%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252026_page_render_times%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252226_page_render_times%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252026_page_render_times%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**27a_multi_container_logs**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/27a_multi_container_logs/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252227a_multi_container_logs%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252027a_multi_container_logs%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252227a_multi_container_logs%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252027a_multi_container_logs%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252227a_multi_container_logs%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252027a_multi_container_logs%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252227a_multi_container_logs%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252027a_multi_container_logs%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252227a_multi_container_logs%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252027a_multi_container_logs%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252227a_multi_container_logs%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252027a_multi_container_logs%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**27b_multi_container_logs**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/27b_multi_container_logs/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252227b_multi_container_logs%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252027b_multi_container_logs%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252227b_multi_container_logs%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252027b_multi_container_logs%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252227b_multi_container_logs%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252027b_multi_container_logs%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252227b_multi_container_logs%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252027b_multi_container_logs%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252227b_multi_container_logs%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252027b_multi_container_logs%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252227b_multi_container_logs%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252027b_multi_container_logs%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**28_permissions_error**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/28_permissions_error/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252228_permissions_error%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252028_permissions_error%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252228_permissions_error%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252028_permissions_error%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252228_permissions_error%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252028_permissions_error%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252228_permissions_error%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252028_permissions_error%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252228_permissions_error%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252028_permissions_error%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252228_permissions_error%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252028_permissions_error%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**33_cpu_metrics_discovery**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/33_cpu_metrics_discovery/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252233_cpu_metrics_discovery%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252033_cpu_metrics_discovery%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252233_cpu_metrics_discovery%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252033_cpu_metrics_discovery%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252233_cpu_metrics_discovery%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252033_cpu_metrics_discovery%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252233_cpu_metrics_discovery%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252033_cpu_metrics_discovery%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252233_cpu_metrics_discovery%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252033_cpu_metrics_discovery%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252233_cpu_metrics_discovery%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252033_cpu_metrics_discovery%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**39_failed_toolset**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/39_failed_toolset/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252239_failed_toolset%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252039_failed_toolset%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252239_failed_toolset%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252039_failed_toolset%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252239_failed_toolset%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252039_failed_toolset%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252239_failed_toolset%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252039_failed_toolset%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252239_failed_toolset%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252039_failed_toolset%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252239_failed_toolset%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252039_failed_toolset%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**41_setup_argo**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/41_setup_argo/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252241_setup_argo%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252041_setup_argo%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252241_setup_argo%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252041_setup_argo%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252241_setup_argo%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252041_setup_argo%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252241_setup_argo%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252041_setup_argo%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252241_setup_argo%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252041_setup_argo%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252241_setup_argo%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252041_setup_argo%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**42_dns_issues_result_new_tools_no_runbook**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/42_dns_issues_result_new_tools_no_runbook/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252242_dns_issues_result_new_tools_no_runbook%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252042_dns_issues_result_new_tools_no_runbook%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252242_dns_issues_result_new_tools_no_runbook%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252042_dns_issues_result_new_tools_no_runbook%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252242_dns_issues_result_new_tools_no_runbook%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252042_dns_issues_result_new_tools_no_runbook%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252242_dns_issues_result_new_tools_no_runbook%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252042_dns_issues_result_new_tools_no_runbook%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [⏱️](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252242_dns_issues_result_new_tools_no_runbook%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252042_dns_issues_result_new_tools_no_runbook%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252242_dns_issues_result_new_tools_no_runbook%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252042_dns_issues_result_new_tools_no_runbook%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**42_dns_issues_steps_new_tools**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/42_dns_issues_steps_new_tools/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252242_dns_issues_steps_new_tools%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252042_dns_issues_steps_new_tools%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252242_dns_issues_steps_new_tools%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252042_dns_issues_steps_new_tools%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252242_dns_issues_steps_new_tools%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252042_dns_issues_steps_new_tools%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252242_dns_issues_steps_new_tools%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252042_dns_issues_steps_new_tools%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252242_dns_issues_steps_new_tools%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252042_dns_issues_steps_new_tools%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252242_dns_issues_steps_new_tools%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252042_dns_issues_steps_new_tools%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**43_current_datetime_from_prompt**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/43_current_datetime_from_prompt/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252243_current_datetime_from_prompt%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252043_current_datetime_from_prompt%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252243_current_datetime_from_prompt%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252043_current_datetime_from_prompt%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252243_current_datetime_from_prompt%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252043_current_datetime_from_prompt%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252243_current_datetime_from_prompt%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252043_current_datetime_from_prompt%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252243_current_datetime_from_prompt%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252043_current_datetime_from_prompt%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252243_current_datetime_from_prompt%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252043_current_datetime_from_prompt%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**43_slack_deployment_logs**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/43_slack_deployment_logs/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252243_slack_deployment_logs%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252043_slack_deployment_logs%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [⏭️](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252243_slack_deployment_logs%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252043_slack_deployment_logs%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [⏭️](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252243_slack_deployment_logs%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252043_slack_deployment_logs%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [⏭️](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252243_slack_deployment_logs%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252043_slack_deployment_logs%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [⏭️](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252243_slack_deployment_logs%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252043_slack_deployment_logs%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [⏭️](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252243_slack_deployment_logs%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252043_slack_deployment_logs%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**44_slack_statefulset_logs**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/44_slack_statefulset_logs/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252244_slack_statefulset_logs%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252044_slack_statefulset_logs%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”§](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252244_slack_statefulset_logs%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252044_slack_statefulset_logs%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”§](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252244_slack_statefulset_logs%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252044_slack_statefulset_logs%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”§](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252244_slack_statefulset_logs%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252044_slack_statefulset_logs%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”§](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252244_slack_statefulset_logs%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252044_slack_statefulset_logs%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”§](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252244_slack_statefulset_logs%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252044_slack_statefulset_logs%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**45_fetch_deployment_logs_simple**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/45_fetch_deployment_logs_simple/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252245_fetch_deployment_logs_simple%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252045_fetch_deployment_logs_simple%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252245_fetch_deployment_logs_simple%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252045_fetch_deployment_logs_simple%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252245_fetch_deployment_logs_simple%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252045_fetch_deployment_logs_simple%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252245_fetch_deployment_logs_simple%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252045_fetch_deployment_logs_simple%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252245_fetch_deployment_logs_simple%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252045_fetch_deployment_logs_simple%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252245_fetch_deployment_logs_simple%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252045_fetch_deployment_logs_simple%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**48_logs_since_thursday**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/48_logs_since_thursday/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252248_logs_since_thursday%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252048_logs_since_thursday%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”§](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252248_logs_since_thursday%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252048_logs_since_thursday%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”§](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252248_logs_since_thursday%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252048_logs_since_thursday%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”§](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252248_logs_since_thursday%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252048_logs_since_thursday%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”§](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252248_logs_since_thursday%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252048_logs_since_thursday%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”§](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252248_logs_since_thursday%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252048_logs_since_thursday%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**50_logs_since_specific_date**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/50_logs_since_specific_date/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252250_logs_since_specific_date%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252050_logs_since_specific_date%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252250_logs_since_specific_date%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252050_logs_since_specific_date%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252250_logs_since_specific_date%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252050_logs_since_specific_date%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252250_logs_since_specific_date%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252050_logs_since_specific_date%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252250_logs_since_specific_date%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252050_logs_since_specific_date%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252250_logs_since_specific_date%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252050_logs_since_specific_date%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**50a_logs_since_last_specific_month**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/50a_logs_since_last_specific_month/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252250a_logs_since_last_specific_month%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252050a_logs_since_last_specific_month%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252250a_logs_since_last_specific_month%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252050a_logs_since_last_specific_month%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252250a_logs_since_last_specific_month%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252050a_logs_since_last_specific_month%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252250a_logs_since_last_specific_month%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252050a_logs_since_last_specific_month%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252250a_logs_since_last_specific_month%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252050a_logs_since_last_specific_month%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252250a_logs_since_last_specific_month%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252050a_logs_since_last_specific_month%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**51_logs_summarize_errors**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/51_logs_summarize_errors/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252251_logs_summarize_errors%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252051_logs_summarize_errors%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252251_logs_summarize_errors%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252051_logs_summarize_errors%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252251_logs_summarize_errors%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252051_logs_summarize_errors%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252251_logs_summarize_errors%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252051_logs_summarize_errors%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252251_logs_summarize_errors%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252051_logs_summarize_errors%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252251_logs_summarize_errors%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252051_logs_summarize_errors%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**52_logs_login_issues**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/52_logs_login_issues/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252252_logs_login_issues%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252052_logs_login_issues%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252252_logs_login_issues%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252052_logs_login_issues%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252252_logs_login_issues%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252052_logs_login_issues%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252252_logs_login_issues%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252052_logs_login_issues%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252252_logs_login_issues%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252052_logs_login_issues%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252252_logs_login_issues%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252052_logs_login_issues%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**53_logs_find_term**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/53_logs_find_term/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252253_logs_find_term%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252053_logs_find_term%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252253_logs_find_term%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252053_logs_find_term%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252253_logs_find_term%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252053_logs_find_term%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252253_logs_find_term%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252053_logs_find_term%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252253_logs_find_term%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252053_logs_find_term%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252253_logs_find_term%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252053_logs_find_term%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**54_not_truncated_when_getting_pods**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/54_not_truncated_when_getting_pods/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252254_not_truncated_when_getting_pods%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252054_not_truncated_when_getting_pods%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252254_not_truncated_when_getting_pods%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252054_not_truncated_when_getting_pods%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252254_not_truncated_when_getting_pods%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252054_not_truncated_when_getting_pods%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252254_not_truncated_when_getting_pods%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252054_not_truncated_when_getting_pods%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252254_not_truncated_when_getting_pods%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252054_not_truncated_when_getting_pods%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252254_not_truncated_when_getting_pods%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252054_not_truncated_when_getting_pods%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**55_kafka_runbook**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/55_kafka_runbook/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252255_kafka_runbook%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252055_kafka_runbook%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [⏭️](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252255_kafka_runbook%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252055_kafka_runbook%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [⏭️](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252255_kafka_runbook%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252055_kafka_runbook%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [⏭️](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252255_kafka_runbook%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252055_kafka_runbook%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [⏭️](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252255_kafka_runbook%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252055_kafka_runbook%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [⏭️](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252255_kafka_runbook%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252055_kafka_runbook%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**57_wrong_namespace**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/57_wrong_namespace/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252257_wrong_namespace%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252057_wrong_namespace%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252257_wrong_namespace%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252057_wrong_namespace%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252257_wrong_namespace%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252057_wrong_namespace%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252257_wrong_namespace%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252057_wrong_namespace%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252257_wrong_namespace%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252057_wrong_namespace%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252257_wrong_namespace%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252057_wrong_namespace%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**59_label_based_counting**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/59_label_based_counting/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252259_label_based_counting%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252059_label_based_counting%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252259_label_based_counting%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252059_label_based_counting%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252259_label_based_counting%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252059_label_based_counting%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252259_label_based_counting%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252059_label_based_counting%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252259_label_based_counting%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252059_label_based_counting%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252259_label_based_counting%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252059_label_based_counting%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**60_count_less_than**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/60_count_less_than/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252260_count_less_than%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252060_count_less_than%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252260_count_less_than%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252060_count_less_than%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252260_count_less_than%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252060_count_less_than%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252260_count_less_than%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252060_count_less_than%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252260_count_less_than%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252060_count_less_than%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252260_count_less_than%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252060_count_less_than%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**61_exact_match_counting**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/61_exact_match_counting/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252261_exact_match_counting%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252061_exact_match_counting%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252261_exact_match_counting%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252061_exact_match_counting%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252261_exact_match_counting%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252061_exact_match_counting%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252261_exact_match_counting%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252061_exact_match_counting%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252261_exact_match_counting%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252061_exact_match_counting%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252261_exact_match_counting%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252061_exact_match_counting%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**62_fetch_error_logs_with_errors**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/62_fetch_error_logs_with_errors/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252262_fetch_error_logs_with_errors%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252062_fetch_error_logs_with_errors%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252262_fetch_error_logs_with_errors%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252062_fetch_error_logs_with_errors%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252262_fetch_error_logs_with_errors%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252062_fetch_error_logs_with_errors%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252262_fetch_error_logs_with_errors%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252062_fetch_error_logs_with_errors%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252262_fetch_error_logs_with_errors%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252062_fetch_error_logs_with_errors%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252262_fetch_error_logs_with_errors%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252062_fetch_error_logs_with_errors%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**63_fetch_error_logs_no_errors**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/63_fetch_error_logs_no_errors/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252263_fetch_error_logs_no_errors%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252063_fetch_error_logs_no_errors%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252263_fetch_error_logs_no_errors%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252063_fetch_error_logs_no_errors%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252263_fetch_error_logs_no_errors%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252063_fetch_error_logs_no_errors%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252263_fetch_error_logs_no_errors%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252063_fetch_error_logs_no_errors%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252263_fetch_error_logs_no_errors%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252063_fetch_error_logs_no_errors%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252263_fetch_error_logs_no_errors%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252063_fetch_error_logs_no_errors%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**64_keda_vs_hpa_confusion**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/64_keda_vs_hpa_confusion/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252264_keda_vs_hpa_confusion%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252064_keda_vs_hpa_confusion%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252264_keda_vs_hpa_confusion%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252064_keda_vs_hpa_confusion%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252264_keda_vs_hpa_confusion%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252064_keda_vs_hpa_confusion%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252264_keda_vs_hpa_confusion%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252064_keda_vs_hpa_confusion%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252264_keda_vs_hpa_confusion%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252064_keda_vs_hpa_confusion%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252264_keda_vs_hpa_confusion%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252064_keda_vs_hpa_confusion%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**65_health_check_followup**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/65_health_check_followup/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252265_health_check_followup%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252065_health_check_followup%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252265_health_check_followup%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252065_health_check_followup%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252265_health_check_followup%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252065_health_check_followup%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252265_health_check_followup%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252065_health_check_followup%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252265_health_check_followup%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252065_health_check_followup%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252265_health_check_followup%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252065_health_check_followup%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**71_connection_pool_starvation**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/71_connection_pool_starvation/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252271_connection_pool_starvation%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252071_connection_pool_starvation%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252271_connection_pool_starvation%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252071_connection_pool_starvation%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252271_connection_pool_starvation%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252071_connection_pool_starvation%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252271_connection_pool_starvation%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252071_connection_pool_starvation%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252271_connection_pool_starvation%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252071_connection_pool_starvation%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252271_connection_pool_starvation%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252071_connection_pool_starvation%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**73a_time_window_anomaly**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/73a_time_window_anomaly/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252273a_time_window_anomaly%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252073a_time_window_anomaly%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252273a_time_window_anomaly%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252073a_time_window_anomaly%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252273a_time_window_anomaly%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252073a_time_window_anomaly%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252273a_time_window_anomaly%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252073a_time_window_anomaly%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252273a_time_window_anomaly%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252073a_time_window_anomaly%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252273a_time_window_anomaly%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252073a_time_window_anomaly%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**73b_time_window_anomaly**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/73b_time_window_anomaly/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252273b_time_window_anomaly%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252073b_time_window_anomaly%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252273b_time_window_anomaly%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252073b_time_window_anomaly%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252273b_time_window_anomaly%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252073b_time_window_anomaly%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252273b_time_window_anomaly%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252073b_time_window_anomaly%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252273b_time_window_anomaly%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252073b_time_window_anomaly%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252273b_time_window_anomaly%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252073b_time_window_anomaly%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**76_service_discovery_issue**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/76_service_discovery_issue/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252276_service_discovery_issue%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252076_service_discovery_issue%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252276_service_discovery_issue%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252076_service_discovery_issue%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252276_service_discovery_issue%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252076_service_discovery_issue%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252276_service_discovery_issue%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252076_service_discovery_issue%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252276_service_discovery_issue%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252076_service_discovery_issue%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252276_service_discovery_issue%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252076_service_discovery_issue%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**77_liveness_probe_misconfiguration**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/77_liveness_probe_misconfiguration/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252277_liveness_probe_misconfiguration%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252077_liveness_probe_misconfiguration%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252277_liveness_probe_misconfiguration%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252077_liveness_probe_misconfiguration%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252277_liveness_probe_misconfiguration%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252077_liveness_probe_misconfiguration%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252277_liveness_probe_misconfiguration%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252077_liveness_probe_misconfiguration%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252277_liveness_probe_misconfiguration%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252077_liveness_probe_misconfiguration%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252277_liveness_probe_misconfiguration%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252077_liveness_probe_misconfiguration%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**78a_missing_cpu_limits**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/78a_missing_cpu_limits/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252278a_missing_cpu_limits%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252078a_missing_cpu_limits%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252278a_missing_cpu_limits%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252078a_missing_cpu_limits%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252278a_missing_cpu_limits%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252078a_missing_cpu_limits%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252278a_missing_cpu_limits%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252078a_missing_cpu_limits%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252278a_missing_cpu_limits%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252078a_missing_cpu_limits%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252278a_missing_cpu_limits%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252078a_missing_cpu_limits%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**78b_cpu_quota_exceeded**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/78b_cpu_quota_exceeded/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252278b_cpu_quota_exceeded%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252078b_cpu_quota_exceeded%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252278b_cpu_quota_exceeded%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252078b_cpu_quota_exceeded%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252278b_cpu_quota_exceeded%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252078b_cpu_quota_exceeded%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252278b_cpu_quota_exceeded%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252078b_cpu_quota_exceeded%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252278b_cpu_quota_exceeded%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252078b_cpu_quota_exceeded%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252278b_cpu_quota_exceeded%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252078b_cpu_quota_exceeded%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**79_configmap_mount_issue**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/79_configmap_mount_issue/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252279_configmap_mount_issue%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252079_configmap_mount_issue%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252279_configmap_mount_issue%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252079_configmap_mount_issue%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252279_configmap_mount_issue%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252079_configmap_mount_issue%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252279_configmap_mount_issue%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252079_configmap_mount_issue%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252279_configmap_mount_issue%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252079_configmap_mount_issue%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252279_configmap_mount_issue%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252079_configmap_mount_issue%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**80_pvc_storage_class_mismatch**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/80_pvc_storage_class_mismatch/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252280_pvc_storage_class_mismatch%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252080_pvc_storage_class_mismatch%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252280_pvc_storage_class_mismatch%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252080_pvc_storage_class_mismatch%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252280_pvc_storage_class_mismatch%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252080_pvc_storage_class_mismatch%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252280_pvc_storage_class_mismatch%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252080_pvc_storage_class_mismatch%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252280_pvc_storage_class_mismatch%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252080_pvc_storage_class_mismatch%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252280_pvc_storage_class_mismatch%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252080_pvc_storage_class_mismatch%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**81_service_account_permission_denied**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/81_service_account_permission_denied/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252281_service_account_permission_denied%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252081_service_account_permission_denied%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252281_service_account_permission_denied%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252081_service_account_permission_denied%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252281_service_account_permission_denied%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252081_service_account_permission_denied%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252281_service_account_permission_denied%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252081_service_account_permission_denied%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252281_service_account_permission_denied%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252081_service_account_permission_denied%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252281_service_account_permission_denied%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252081_service_account_permission_denied%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**82_pod_anti_affinity_conflict**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/82_pod_anti_affinity_conflict/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252282_pod_anti_affinity_conflict%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252082_pod_anti_affinity_conflict%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252282_pod_anti_affinity_conflict%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252082_pod_anti_affinity_conflict%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252282_pod_anti_affinity_conflict%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252082_pod_anti_affinity_conflict%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252282_pod_anti_affinity_conflict%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252082_pod_anti_affinity_conflict%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252282_pod_anti_affinity_conflict%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252082_pod_anti_affinity_conflict%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252282_pod_anti_affinity_conflict%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252082_pod_anti_affinity_conflict%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**83_secret_not_found**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/83_secret_not_found/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252283_secret_not_found%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252083_secret_not_found%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252283_secret_not_found%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252083_secret_not_found%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252283_secret_not_found%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252083_secret_not_found%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252283_secret_not_found%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252083_secret_not_found%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252283_secret_not_found%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252083_secret_not_found%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252283_secret_not_found%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252083_secret_not_found%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**84_network_policy_blocking_traffic**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/84_network_policy_blocking_traffic/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252284_network_policy_blocking_traffic%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252084_network_policy_blocking_traffic%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252284_network_policy_blocking_traffic%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252084_network_policy_blocking_traffic%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252284_network_policy_blocking_traffic%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252084_network_policy_blocking_traffic%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252284_network_policy_blocking_traffic%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252084_network_policy_blocking_traffic%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252284_network_policy_blocking_traffic%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252084_network_policy_blocking_traffic%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252284_network_policy_blocking_traffic%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252084_network_policy_blocking_traffic%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**85_hpa_not_scaling**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/85_hpa_not_scaling/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252285_hpa_not_scaling%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252085_hpa_not_scaling%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252285_hpa_not_scaling%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252085_hpa_not_scaling%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252285_hpa_not_scaling%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252085_hpa_not_scaling%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252285_hpa_not_scaling%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252085_hpa_not_scaling%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252285_hpa_not_scaling%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252085_hpa_not_scaling%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252285_hpa_not_scaling%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252085_hpa_not_scaling%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**86_configmap_like_but_secret**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/86_configmap_like_but_secret/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252286_configmap_like_but_secret%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252086_configmap_like_but_secret%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252286_configmap_like_but_secret%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252086_configmap_like_but_secret%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252286_configmap_like_but_secret%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252086_configmap_like_but_secret%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252286_configmap_like_but_secret%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252086_configmap_like_but_secret%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252286_configmap_like_but_secret%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252086_configmap_like_but_secret%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252286_configmap_like_but_secret%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252086_configmap_like_but_secret%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**89_runbook_missing_cloudwatch**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/89_runbook_missing_cloudwatch/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252289_runbook_missing_cloudwatch%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252089_runbook_missing_cloudwatch%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252289_runbook_missing_cloudwatch%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252089_runbook_missing_cloudwatch%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252289_runbook_missing_cloudwatch%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252089_runbook_missing_cloudwatch%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252289_runbook_missing_cloudwatch%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252089_runbook_missing_cloudwatch%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252289_runbook_missing_cloudwatch%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252089_runbook_missing_cloudwatch%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252289_runbook_missing_cloudwatch%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252089_runbook_missing_cloudwatch%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**90_runbook_basic_selection**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/90_runbook_basic_selection/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252290_runbook_basic_selection%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252090_runbook_basic_selection%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252290_runbook_basic_selection%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252090_runbook_basic_selection%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252290_runbook_basic_selection%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252090_runbook_basic_selection%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252290_runbook_basic_selection%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252090_runbook_basic_selection%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252290_runbook_basic_selection%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252090_runbook_basic_selection%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252290_runbook_basic_selection%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252090_runbook_basic_selection%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**91f_datadog_logs_historical_pod**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/91f_datadog_logs_historical_pod/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252291f_datadog_logs_historical_pod%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252091f_datadog_logs_historical_pod%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252291f_datadog_logs_historical_pod%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252091f_datadog_logs_historical_pod%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252291f_datadog_logs_historical_pod%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252091f_datadog_logs_historical_pod%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252291f_datadog_logs_historical_pod%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252091f_datadog_logs_historical_pod%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252291f_datadog_logs_historical_pod%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252091f_datadog_logs_historical_pod%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252291f_datadog_logs_historical_pod%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252091f_datadog_logs_historical_pod%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**93_calling_datadog[0]**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/93_calling_datadog[0]/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252293_calling_datadog%255B0%255D%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252093_calling_datadog%255B0%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252293_calling_datadog%255B0%255D%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252093_calling_datadog%255B0%255D%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252293_calling_datadog%255B0%255D%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252093_calling_datadog%255B0%255D%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252293_calling_datadog%255B0%255D%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252093_calling_datadog%255B0%255D%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [⏱️](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252293_calling_datadog%255B0%255D%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252093_calling_datadog%255B0%255D%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252293_calling_datadog%255B0%255D%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252093_calling_datadog%255B0%255D%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**93_calling_datadog[1]**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/93_calling_datadog[1]/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252293_calling_datadog%255B1%255D%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252093_calling_datadog%255B1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252293_calling_datadog%255B1%255D%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252093_calling_datadog%255B1%255D%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252293_calling_datadog%255B1%255D%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252093_calling_datadog%255B1%255D%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252293_calling_datadog%255B1%255D%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252093_calling_datadog%255B1%255D%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252293_calling_datadog%255B1%255D%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252093_calling_datadog%255B1%255D%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252293_calling_datadog%255B1%255D%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252093_calling_datadog%255B1%255D%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**93_calling_datadog[2]**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/93_calling_datadog[2]/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252293_calling_datadog%255B2%255D%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252093_calling_datadog%255B2%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252293_calling_datadog%255B2%255D%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252093_calling_datadog%255B2%255D%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252293_calling_datadog%255B2%255D%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252093_calling_datadog%255B2%255D%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252293_calling_datadog%255B2%255D%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252093_calling_datadog%255B2%255D%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252293_calling_datadog%255B2%255D%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252093_calling_datadog%255B2%255D%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252293_calling_datadog%255B2%255D%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252093_calling_datadog%255B2%255D%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**93_events_since_specific_date**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/93_events_since_specific_date/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252293_events_since_specific_date%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252093_events_since_specific_date%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252293_events_since_specific_date%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252093_events_since_specific_date%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252293_events_since_specific_date%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252093_events_since_specific_date%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”§](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252293_events_since_specific_date%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252093_events_since_specific_date%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252293_events_since_specific_date%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252093_events_since_specific_date%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252293_events_since_specific_date%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252093_events_since_specific_date%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**94_runbook_transparency**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/94_runbook_transparency/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252294_runbook_transparency%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252094_runbook_transparency%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252294_runbook_transparency%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252094_runbook_transparency%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252294_runbook_transparency%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252094_runbook_transparency%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252294_runbook_transparency%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252094_runbook_transparency%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252294_runbook_transparency%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252094_runbook_transparency%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252294_runbook_transparency%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252094_runbook_transparency%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**96_no_matching_runbook**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/96_no_matching_runbook/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252296_no_matching_runbook%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252096_no_matching_runbook%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252296_no_matching_runbook%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252096_no_matching_runbook%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252296_no_matching_runbook%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252096_no_matching_runbook%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252296_no_matching_runbook%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252096_no_matching_runbook%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252296_no_matching_runbook%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252096_no_matching_runbook%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252296_no_matching_runbook%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252096_no_matching_runbook%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**97_logs_clarification_needed**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/97_logs_clarification_needed/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252297_logs_clarification_needed%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252097_logs_clarification_needed%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252297_logs_clarification_needed%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252097_logs_clarification_needed%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252297_logs_clarification_needed%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252097_logs_clarification_needed%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252297_logs_clarification_needed%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252097_logs_clarification_needed%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252297_logs_clarification_needed%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252097_logs_clarification_needed%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252297_logs_clarification_needed%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252097_logs_clarification_needed%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**98_logs_transparency_default_time**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/98_logs_transparency_default_time/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252298_logs_transparency_default_time%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252098_logs_transparency_default_time%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [⏭️](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252298_logs_transparency_default_time%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252098_logs_transparency_default_time%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [⏭️](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252298_logs_transparency_default_time%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252098_logs_transparency_default_time%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [⏭️](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252298_logs_transparency_default_time%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252098_logs_transparency_default_time%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [⏭️](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252298_logs_transparency_default_time%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252098_logs_transparency_default_time%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [⏭️](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252298_logs_transparency_default_time%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252098_logs_transparency_default_time%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [**99_logs_transparency_custom_time**](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/99_logs_transparency_custom_time/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252299_logs_transparency_custom_time%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252099_logs_transparency_custom_time%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252299_logs_transparency_custom_time%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252099_logs_transparency_custom_time%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252299_logs_transparency_custom_time%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252099_logs_transparency_custom_time%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252299_logs_transparency_custom_time%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252099_logs_transparency_custom_time%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252299_logs_transparency_custom_time%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252099_logs_transparency_custom_time%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252299_logs_transparency_custom_time%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252099_logs_transparency_custom_time%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| **SUMMARY** | 🟑 57% (54/94) | 🟑 74% (70/94) | 🟑 84% (78/93) | 🟑 85% (80/94) | 🟑 95% (89/94) | ## Detailed Raw Results | Eval ID | gpt-4o | gpt-4.1 | gpt-5 | sonnet-4-20250514 | sonnet-4-5-20250929 | |---------|-------|-------|-------|-------|-------| -| [01_how_many_pods](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/01_how_many_pods/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252201_how_many_pods%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252001_how_many_pods%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252201_how_many_pods%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252001_how_many_pods%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 31.3s / πŸ’° $0.08 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252201_how_many_pods%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252001_how_many_pods%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 33.2s / πŸ’° $0.05 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252201_how_many_pods%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252001_how_many_pods%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 43.4s / πŸ’° $0.04 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252201_how_many_pods%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252001_how_many_pods%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 34.3s / πŸ’° $0.08 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252201_how_many_pods%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252001_how_many_pods%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 33.6s / πŸ’° $0.08 | -| [02_what_is_wrong_with_pod](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/02_what_is_wrong_with_pod/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252202_what_is_wrong_with_pod%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252002_what_is_wrong_with_pod%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252202_what_is_wrong_with_pod%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252002_what_is_wrong_with_pod%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 42.9s / πŸ’° $0.07 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252202_what_is_wrong_with_pod%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252002_what_is_wrong_with_pod%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 38.7s / πŸ’° $0.06 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252202_what_is_wrong_with_pod%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252002_what_is_wrong_with_pod%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 123.9s / πŸ’° $0.10 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252202_what_is_wrong_with_pod%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252002_what_is_wrong_with_pod%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 53.5s / πŸ’° $0.11 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252202_what_is_wrong_with_pod%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252002_what_is_wrong_with_pod%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 67.5s / πŸ’° $0.10 | -| [03_what_is_the_command_to_port_forward](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/03_what_is_the_command_to_port_forward/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252203_what_is_the_command_to_port_forward%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252003_what_is_the_command_to_port_forward%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252203_what_is_the_command_to_port_forward%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252003_what_is_the_command_to_port_forward%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 61.2s / πŸ’° $0.12 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252203_what_is_the_command_to_port_forward%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252003_what_is_the_command_to_port_forward%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 53.7s / πŸ’° $0.11 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252203_what_is_the_command_to_port_forward%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252003_what_is_the_command_to_port_forward%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 68.9s / πŸ’° $0.06 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252203_what_is_the_command_to_port_forward%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252003_what_is_the_command_to_port_forward%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 43.0s / πŸ’° $0.12 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252203_what_is_the_command_to_port_forward%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252003_what_is_the_command_to_port_forward%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 51.9s / πŸ’° $0.09 | -| [04_related_k8s_events](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/04_related_k8s_events/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252204_related_k8s_events%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252004_related_k8s_events%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252204_related_k8s_events%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252004_related_k8s_events%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 39.4s / πŸ’° $0.11 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252204_related_k8s_events%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252004_related_k8s_events%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 38.7s / πŸ’° $0.06 | [🟑 80% (4/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252204_related_k8s_events%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252004_related_k8s_events%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 69.1s / πŸ’° $0.05 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252204_related_k8s_events%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252004_related_k8s_events%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 58.8s / πŸ’° $0.09 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252204_related_k8s_events%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252004_related_k8s_events%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 62.4s / πŸ’° $0.09 | -| [05_image_version](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/05_image_version/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252205_image_version%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252005_image_version%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252205_image_version%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252005_image_version%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 43.6s / πŸ’° $0.10 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252205_image_version%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252005_image_version%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 56.3s / πŸ’° $0.07 | [🟑 80% (4/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252205_image_version%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252005_image_version%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 73.8s / πŸ’° $0.06 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252205_image_version%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252005_image_version%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 37.0s / πŸ’° $0.09 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252205_image_version%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252005_image_version%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 38.1s / πŸ’° $0.09 | -| [09_crashpod](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/09_crashpod/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252209_crashpod%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252009_crashpod%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252209_crashpod%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252009_crashpod%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 43.0s / πŸ’° $0.13 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252209_crashpod%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252009_crashpod%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 37.3s / πŸ’° $0.06 | [🟑 80% (4/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252209_crashpod%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252009_crashpod%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 92.4s / πŸ’° $0.08 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252209_crashpod%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252009_crashpod%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 73.0s / πŸ’° $0.14 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252209_crashpod%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252009_crashpod%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 64.8s / πŸ’° $0.14 | -| [100a_historical_logs](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/100a_historical_logs/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522100a_historical_logs%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520100a_historical_logs%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄ 0% (0/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522100a_historical_logs%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520100a_historical_logs%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 52.3s / πŸ’° $0.12 | [πŸ”΄ 0% (0/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522100a_historical_logs%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520100a_historical_logs%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 54.8s / πŸ’° $0.07 | [πŸ”΄ 0% (0/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522100a_historical_logs%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520100a_historical_logs%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 500.6s / πŸ’° $0.29 | [πŸ”΄ 0% (0/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522100a_historical_logs%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520100a_historical_logs%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 116.2s / πŸ’° $0.27 | [πŸ”΄ 0% (0/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522100a_historical_logs%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520100a_historical_logs%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 98.0s / πŸ’° $0.19 | -| [100b_historical_logs_nonstandard_label](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/100b_historical_logs_nonstandard_label/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522100b_historical_logs_nonstandard_label%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520100b_historical_logs_nonstandard_label%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄ 0% (0/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522100b_historical_logs_nonstandard_label%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520100b_historical_logs_nonstandard_label%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 50.9s / πŸ’° $0.11 | [πŸ”΄ 0% (0/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522100b_historical_logs_nonstandard_label%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520100b_historical_logs_nonstandard_label%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 57.5s / πŸ’° $0.07 | [πŸ”΄ 0% (0/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522100b_historical_logs_nonstandard_label%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520100b_historical_logs_nonstandard_label%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 363.3s / πŸ’° $0.22 | [πŸ”΄ 0% (0/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522100b_historical_logs_nonstandard_label%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520100b_historical_logs_nonstandard_label%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 157.0s / πŸ’° $0.18 | [πŸ”΄ 0% (0/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522100b_historical_logs_nonstandard_label%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520100b_historical_logs_nonstandard_label%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 102.9s / πŸ’° $0.17 | -| [101_historical_logs_pod_deleted](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/101_historical_logs_pod_deleted/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522101_historical_logs_pod_deleted%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520101_historical_logs_pod_deleted%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄ 0% (0/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522101_historical_logs_pod_deleted%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520101_historical_logs_pod_deleted%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 53.2s / πŸ’° $0.12 | [πŸ”΄ 0% (0/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522101_historical_logs_pod_deleted%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520101_historical_logs_pod_deleted%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 53.8s / πŸ’° $0.08 | [πŸ”΄ 0% (0/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522101_historical_logs_pod_deleted%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520101_historical_logs_pod_deleted%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 268.6s / πŸ’° $0.16 | [πŸ”΄ 0% (0/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522101_historical_logs_pod_deleted%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520101_historical_logs_pod_deleted%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 97.5s / πŸ’° $0.16 | [πŸ”΄ 0% (0/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522101_historical_logs_pod_deleted%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520101_historical_logs_pod_deleted%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 85.9s / πŸ’° $0.15 | -| [103_logs_transparency_default_limit](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/103_logs_transparency_default_limit/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522103_logs_transparency_default_limit%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520103_logs_transparency_default_limit%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄ 0% (0/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522103_logs_transparency_default_limit%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520103_logs_transparency_default_limit%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 63.1s / πŸ’° $0.15 | [πŸ”΄ 0% (0/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522103_logs_transparency_default_limit%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520103_logs_transparency_default_limit%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 105.8s / πŸ’° $0.39 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522103_logs_transparency_default_limit%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520103_logs_transparency_default_limit%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 137.1s / πŸ’° $0.09 | [πŸ”΄ 0% (0/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522103_logs_transparency_default_limit%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520103_logs_transparency_default_limit%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 81.0s / πŸ’° $0.41 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522103_logs_transparency_default_limit%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520103_logs_transparency_default_limit%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 74.6s / πŸ’° $0.12 | -| [104a_postgres_root_issue](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/104a_postgres_root_issue/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522104a_postgres_root_issue%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520104a_postgres_root_issue%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄ 0% (0/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522104a_postgres_root_issue%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520104a_postgres_root_issue%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 48.3s / πŸ’° $0.18 | [🟑 60% (3/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522104a_postgres_root_issue%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520104a_postgres_root_issue%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 85.6s / πŸ’° $0.35 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522104a_postgres_root_issue%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520104a_postgres_root_issue%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 233.2s / πŸ’° $0.21 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522104a_postgres_root_issue%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520104a_postgres_root_issue%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 71.9s / πŸ’° $0.19 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522104a_postgres_root_issue%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520104a_postgres_root_issue%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 106.0s / πŸ’° $0.24 | -| [107_log_filter_http_status_code](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/107_log_filter_http_status_code/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522107_log_filter_http_status_code%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520107_log_filter_http_status_code%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟑 40% (2/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522107_log_filter_http_status_code%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520107_log_filter_http_status_code%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 54.0s / πŸ’° $0.15 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522107_log_filter_http_status_code%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520107_log_filter_http_status_code%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 57.4s / πŸ’° $0.10 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522107_log_filter_http_status_code%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520107_log_filter_http_status_code%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 472.2s / πŸ’° $0.30 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522107_log_filter_http_status_code%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520107_log_filter_http_status_code%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 127.5s / πŸ’° $0.22 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522107_log_filter_http_status_code%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520107_log_filter_http_status_code%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 100.3s / πŸ’° $0.24 | -| [108_logs_nearby_lines](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/108_logs_nearby_lines/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522108_logs_nearby_lines%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520108_logs_nearby_lines%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄ 0% (0/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522108_logs_nearby_lines%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520108_logs_nearby_lines%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 64.2s / πŸ’° $0.17 | [πŸ”΄ 0% (0/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522108_logs_nearby_lines%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520108_logs_nearby_lines%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 57.6s / πŸ’° $0.23 | [🟑 40% (2/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522108_logs_nearby_lines%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520108_logs_nearby_lines%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 345.4s / πŸ’° $0.26 | [🟑 20% (1/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522108_logs_nearby_lines%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520108_logs_nearby_lines%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 111.3s / πŸ’° $0.36 | [πŸ”΄ 0% (0/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522108_logs_nearby_lines%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520108_logs_nearby_lines%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 89.7s / πŸ’° $0.22 | -| [109_logs_transparency_not_found](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/109_logs_transparency_not_found/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522109_logs_transparency_not_found%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520109_logs_transparency_not_found%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟑 80% (4/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522109_logs_transparency_not_found%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520109_logs_transparency_not_found%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 47.2s / πŸ’° $0.13 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522109_logs_transparency_not_found%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520109_logs_transparency_not_found%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 44.5s / πŸ’° $0.07 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522109_logs_transparency_not_found%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520109_logs_transparency_not_found%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 135.7s / πŸ’° $0.09 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522109_logs_transparency_not_found%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520109_logs_transparency_not_found%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 44.4s / πŸ’° $0.09 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522109_logs_transparency_not_found%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520109_logs_transparency_not_found%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 48.1s / πŸ’° $0.10 | -| [10_image_pull_backoff](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/10_image_pull_backoff/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252210_image_pull_backoff%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252010_image_pull_backoff%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252210_image_pull_backoff%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252010_image_pull_backoff%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 47.3s / πŸ’° $0.18 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252210_image_pull_backoff%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252010_image_pull_backoff%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 55.9s / πŸ’° $0.10 | [🟑 60% (3/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252210_image_pull_backoff%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252010_image_pull_backoff%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 99.9s / πŸ’° $0.10 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252210_image_pull_backoff%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252010_image_pull_backoff%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 59.4s / πŸ’° $0.13 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252210_image_pull_backoff%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252010_image_pull_backoff%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 60.0s / πŸ’° $0.13 | -| [110_k8s_events_image_pull](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/110_k8s_events_image_pull/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522110_k8s_events_image_pull%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520110_k8s_events_image_pull%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522110_k8s_events_image_pull%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520110_k8s_events_image_pull%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 34.7s / πŸ’° $0.09 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522110_k8s_events_image_pull%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520110_k8s_events_image_pull%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 42.4s / πŸ’° $0.07 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522110_k8s_events_image_pull%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520110_k8s_events_image_pull%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 100.1s / πŸ’° $0.07 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522110_k8s_events_image_pull%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520110_k8s_events_image_pull%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 72.1s / πŸ’° $0.10 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522110_k8s_events_image_pull%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520110_k8s_events_image_pull%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 53.4s / πŸ’° $0.10 | -| [111_disabled_datadog_traces](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/111_disabled_datadog_traces/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522111_disabled_datadog_traces%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520111_disabled_datadog_traces%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄ 0% (0/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522111_disabled_datadog_traces%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520111_disabled_datadog_traces%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 40.5s / πŸ’° $0.03 | [🟑 60% (3/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522111_disabled_datadog_traces%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520111_disabled_datadog_traces%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 39.6s / πŸ’° $0.03 | [🟑 80% (4/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522111_disabled_datadog_traces%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520111_disabled_datadog_traces%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 235.0s / πŸ’° $0.13 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522111_disabled_datadog_traces%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520111_disabled_datadog_traces%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 87.4s / πŸ’° $0.15 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522111_disabled_datadog_traces%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520111_disabled_datadog_traces%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 44.8s / πŸ’° $0.06 | -| [111_pod_names_contain_service](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/111_pod_names_contain_service/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522111_pod_names_contain_service%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520111_pod_names_contain_service%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522111_pod_names_contain_service%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520111_pod_names_contain_service%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 71.3s / πŸ’° $0.16 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522111_pod_names_contain_service%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520111_pod_names_contain_service%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 68.3s / πŸ’° $0.10 | [🟑 40% (2/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522111_pod_names_contain_service%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520111_pod_names_contain_service%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 210.5s / πŸ’° $0.10 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522111_pod_names_contain_service%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520111_pod_names_contain_service%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 77.3s / πŸ’° $0.20 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522111_pod_names_contain_service%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520111_pod_names_contain_service%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 66.9s / πŸ’° $0.16 | -| [112_find_pvcs_by_uuid](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/112_find_pvcs_by_uuid/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522112_find_pvcs_by_uuid%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520112_find_pvcs_by_uuid%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄ 0% (0/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522112_find_pvcs_by_uuid%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520112_find_pvcs_by_uuid%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 45.8s / πŸ’° $0.12 | [🟑 20% (1/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522112_find_pvcs_by_uuid%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520112_find_pvcs_by_uuid%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 58.2s / πŸ’° $0.08 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522112_find_pvcs_by_uuid%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520112_find_pvcs_by_uuid%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 147.8s / πŸ’° $0.08 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522112_find_pvcs_by_uuid%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520112_find_pvcs_by_uuid%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 67.5s / πŸ’° $0.13 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522112_find_pvcs_by_uuid%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520112_find_pvcs_by_uuid%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 88.6s / πŸ’° $0.13 | -| [114_checkout_latency_tracing_rebuild[0]](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/114_checkout_latency_tracing_rebuild[0]/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522114_checkout_latency_tracing_rebuild%255B0%255D%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520114_checkout_latency_tracing_rebuild%255B0%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄ 0% (0/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522114_checkout_latency_tracing_rebuild%255B0%255D%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520114_checkout_latency_tracing_rebuild%255B0%255D%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 69.7s / πŸ’° $0.20 | [🟑 20% (1/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522114_checkout_latency_tracing_rebuild%255B0%255D%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520114_checkout_latency_tracing_rebuild%255B0%255D%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 89.3s / πŸ’° $0.16 | [🟑 40% (2/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522114_checkout_latency_tracing_rebuild%255B0%255D%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520114_checkout_latency_tracing_rebuild%255B0%255D%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 377.2s / πŸ’° $0.34 | [🟑 20% (1/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522114_checkout_latency_tracing_rebuild%255B0%255D%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520114_checkout_latency_tracing_rebuild%255B0%255D%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 148.2s / πŸ’° $0.31 | [🟑 40% (2/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522114_checkout_latency_tracing_rebuild%255B0%255D%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520114_checkout_latency_tracing_rebuild%255B0%255D%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 173.2s / πŸ’° $0.52 | -| [115_checkout_errors_tracing[0]](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/115_checkout_errors_tracing[0]/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522115_checkout_errors_tracing%255B0%255D%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520115_checkout_errors_tracing%255B0%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄ 0% (0/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522115_checkout_errors_tracing%255B0%255D%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520115_checkout_errors_tracing%255B0%255D%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 87.3s / πŸ’° $0.22 | [πŸ”΄ 0% (0/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522115_checkout_errors_tracing%255B0%255D%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520115_checkout_errors_tracing%255B0%255D%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 93.8s / πŸ’° $0.21 | [🟑 40% (2/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522115_checkout_errors_tracing%255B0%255D%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520115_checkout_errors_tracing%255B0%255D%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 265.8s / πŸ’° $0.20 | [🟑 20% (1/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522115_checkout_errors_tracing%255B0%255D%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520115_checkout_errors_tracing%255B0%255D%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 136.2s / πŸ’° $0.30 | [🟑 20% (1/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522115_checkout_errors_tracing%255B0%255D%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520115_checkout_errors_tracing%255B0%255D%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 255.3s / πŸ’° $0.51 | -| [11_init_containers](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/11_init_containers/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252211_init_containers%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252011_init_containers%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252211_init_containers%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252011_init_containers%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 45.3s / πŸ’° $0.11 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252211_init_containers%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252011_init_containers%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 54.0s / πŸ’° $0.07 | [🟑 80% (4/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252211_init_containers%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252011_init_containers%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 139.5s / πŸ’° $0.10 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252211_init_containers%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252011_init_containers%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 65.4s / πŸ’° $0.12 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252211_init_containers%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252011_init_containers%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 73.8s / πŸ’° $0.11 | -| [121_new_relic_checkout_errors_tracing[0]](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/121_new_relic_checkout_errors_tracing[0]/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522121_new_relic_checkout_errors_tracing%255B0%255D%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520121_new_relic_checkout_errors_tracing%255B0%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄ 0% (0/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522121_new_relic_checkout_errors_tracing%255B0%255D%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520121_new_relic_checkout_errors_tracing%255B0%255D%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 35.5s / πŸ’° $0.10 | [πŸ”΄ 0% (0/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522121_new_relic_checkout_errors_tracing%255B0%255D%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520121_new_relic_checkout_errors_tracing%255B0%255D%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 40.7s / πŸ’° $0.05 | [🟑 60% (3/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522121_new_relic_checkout_errors_tracing%255B0%255D%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520121_new_relic_checkout_errors_tracing%255B0%255D%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 530.6s / πŸ’° $0.41 | [🟑 40% (2/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522121_new_relic_checkout_errors_tracing%255B0%255D%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520121_new_relic_checkout_errors_tracing%255B0%255D%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 189.5s / πŸ’° $0.48 | [🟑 60% (3/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522121_new_relic_checkout_errors_tracing%255B0%255D%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520121_new_relic_checkout_errors_tracing%255B0%255D%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 145.3s / πŸ’° $0.41 | -| [122_new_relic_checkout_latency_tracing_rebuild[0]](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/122_new_relic_checkout_latency_tracing_rebuild[0]/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522122_new_relic_checkout_latency_tracing_rebuild%255B0%255D%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520122_new_relic_checkout_latency_tracing_rebuild%255B0%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄ 0% (0/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522122_new_relic_checkout_latency_tracing_rebuild%255B0%255D%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520122_new_relic_checkout_latency_tracing_rebuild%255B0%255D%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 42.6s / πŸ’° $0.20 | [πŸ”΄ 0% (0/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522122_new_relic_checkout_latency_tracing_rebuild%255B0%255D%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520122_new_relic_checkout_latency_tracing_rebuild%255B0%255D%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 65.4s / πŸ’° $0.25 | [🟑 40% (2/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522122_new_relic_checkout_latency_tracing_rebuild%255B0%255D%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520122_new_relic_checkout_latency_tracing_rebuild%255B0%255D%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 583.9s / πŸ’° $0.36 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522122_new_relic_checkout_latency_tracing_rebuild%255B0%255D%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520122_new_relic_checkout_latency_tracing_rebuild%255B0%255D%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 293.4s / πŸ’° $0.41 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522122_new_relic_checkout_latency_tracing_rebuild%255B0%255D%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520122_new_relic_checkout_latency_tracing_rebuild%255B0%255D%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 156.6s / πŸ’° $0.39 | -| [123_new_relic_checkout_errors_tracing[0]](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/123_new_relic_checkout_errors_tracing[0]/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522123_new_relic_checkout_errors_tracing%255B0%255D%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520123_new_relic_checkout_errors_tracing%255B0%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄ 0% (0/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522123_new_relic_checkout_errors_tracing%255B0%255D%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520123_new_relic_checkout_errors_tracing%255B0%255D%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 63.9s / πŸ’° $0.11 | [πŸ”΄ 0% (0/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522123_new_relic_checkout_errors_tracing%255B0%255D%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520123_new_relic_checkout_errors_tracing%255B0%255D%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 50.6s / πŸ’° $0.06 | [🟑 20% (1/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522123_new_relic_checkout_errors_tracing%255B0%255D%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520123_new_relic_checkout_errors_tracing%255B0%255D%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 343.2s / πŸ’° $0.31 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522123_new_relic_checkout_errors_tracing%255B0%255D%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520123_new_relic_checkout_errors_tracing%255B0%255D%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 155.5s / πŸ’° $0.44 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522123_new_relic_checkout_errors_tracing%255B0%255D%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520123_new_relic_checkout_errors_tracing%255B0%255D%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 124.7s / πŸ’° $0.37 | -| [12_job_crashing](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/12_job_crashing/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252212_job_crashing%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252012_job_crashing%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟑 60% (3/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252212_job_crashing%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252012_job_crashing%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 49.7s / πŸ’° $0.11 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252212_job_crashing%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252012_job_crashing%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 41.9s / πŸ’° $0.08 | [🟑 80% (4/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252212_job_crashing%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252012_job_crashing%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 184.2s / πŸ’° $0.15 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252212_job_crashing%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252012_job_crashing%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 92.1s / πŸ’° $0.13 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252212_job_crashing%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252012_job_crashing%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 65.8s / πŸ’° $0.14 | -| [13a_pending_node_selector_basic](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/13a_pending_node_selector_basic/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252213a_pending_node_selector_basic%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252013a_pending_node_selector_basic%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252213a_pending_node_selector_basic%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252013a_pending_node_selector_basic%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 52.8s / πŸ’° $0.14 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252213a_pending_node_selector_basic%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252013a_pending_node_selector_basic%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 52.0s / πŸ’° $0.10 | [🟑 20% (1/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252213a_pending_node_selector_basic%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252013a_pending_node_selector_basic%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 84.3s / πŸ’° $0.04 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252213a_pending_node_selector_basic%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252013a_pending_node_selector_basic%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 119.2s / πŸ’° $0.14 | [🟑 80% (4/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252213a_pending_node_selector_basic%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252013a_pending_node_selector_basic%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 54.6s / πŸ’° $0.11 | -| [13b_pending_node_selector_detailed](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/13b_pending_node_selector_detailed/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252213b_pending_node_selector_detailed%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252013b_pending_node_selector_detailed%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄ 0% (0/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252213b_pending_node_selector_detailed%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252013b_pending_node_selector_detailed%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 42.9s / πŸ’° $0.13 | [🟑 80% (4/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252213b_pending_node_selector_detailed%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252013b_pending_node_selector_detailed%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 45.4s / πŸ’° $0.09 | [🟑 40% (2/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252213b_pending_node_selector_detailed%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252013b_pending_node_selector_detailed%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 110.8s / πŸ’° $0.08 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252213b_pending_node_selector_detailed%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252013b_pending_node_selector_detailed%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 66.7s / πŸ’° $0.13 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252213b_pending_node_selector_detailed%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252013b_pending_node_selector_detailed%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 63.6s / πŸ’° $0.14 | -| [14_pending_resources](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/14_pending_resources/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252214_pending_resources%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252014_pending_resources%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252214_pending_resources%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252014_pending_resources%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 58.7s / πŸ’° $0.13 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252214_pending_resources%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252014_pending_resources%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 70.5s / πŸ’° $0.10 | [🟑 20% (1/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252214_pending_resources%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252014_pending_resources%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 70.4s / πŸ’° $0.04 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252214_pending_resources%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252014_pending_resources%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 114.5s / πŸ’° $0.13 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252214_pending_resources%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252014_pending_resources%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 80.2s / πŸ’° $0.13 | -| [159_prometheus_high_cardinality_cpu[0]](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/159_prometheus_high_cardinality_cpu[0]/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522159_prometheus_high_cardinality_cpu%255B0%255D%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520159_prometheus_high_cardinality_cpu%255B0%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522159_prometheus_high_cardinality_cpu%255B0%255D%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520159_prometheus_high_cardinality_cpu%255B0%255D%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 39.3s / πŸ’° $0.16 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522159_prometheus_high_cardinality_cpu%255B0%255D%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520159_prometheus_high_cardinality_cpu%255B0%255D%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 51.0s / πŸ’° $0.13 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522159_prometheus_high_cardinality_cpu%255B0%255D%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520159_prometheus_high_cardinality_cpu%255B0%255D%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 231.2s / πŸ’° $0.20 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522159_prometheus_high_cardinality_cpu%255B0%255D%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520159_prometheus_high_cardinality_cpu%255B0%255D%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 66.9s / πŸ’° $0.17 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522159_prometheus_high_cardinality_cpu%255B0%255D%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520159_prometheus_high_cardinality_cpu%255B0%255D%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 59.8s / πŸ’° $0.16 | -| [159_prometheus_high_cardinality_cpu[1]](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/159_prometheus_high_cardinality_cpu[1]/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522159_prometheus_high_cardinality_cpu%255B1%255D%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520159_prometheus_high_cardinality_cpu%255B1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟑 60% (3/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522159_prometheus_high_cardinality_cpu%255B1%255D%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520159_prometheus_high_cardinality_cpu%255B1%255D%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 43.7s / πŸ’° $0.20 | [🟑 80% (4/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522159_prometheus_high_cardinality_cpu%255B1%255D%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520159_prometheus_high_cardinality_cpu%255B1%255D%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 46.6s / πŸ’° $0.16 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522159_prometheus_high_cardinality_cpu%255B1%255D%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520159_prometheus_high_cardinality_cpu%255B1%255D%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 154.2s / πŸ’° $0.13 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522159_prometheus_high_cardinality_cpu%255B1%255D%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520159_prometheus_high_cardinality_cpu%255B1%255D%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 84.9s / πŸ’° $0.22 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522159_prometheus_high_cardinality_cpu%255B1%255D%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520159_prometheus_high_cardinality_cpu%255B1%255D%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 82.1s / πŸ’° $0.19 | -| [159_prometheus_high_cardinality_cpu[2]](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/159_prometheus_high_cardinality_cpu[2]/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522159_prometheus_high_cardinality_cpu%255B2%255D%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520159_prometheus_high_cardinality_cpu%255B2%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄ 0% (0/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522159_prometheus_high_cardinality_cpu%255B2%255D%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520159_prometheus_high_cardinality_cpu%255B2%255D%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 35.1s / πŸ’° $0.10 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522159_prometheus_high_cardinality_cpu%255B2%255D%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520159_prometheus_high_cardinality_cpu%255B2%255D%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 50.6s / πŸ’° $0.16 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522159_prometheus_high_cardinality_cpu%255B2%255D%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520159_prometheus_high_cardinality_cpu%255B2%255D%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 130.8s / πŸ’° $0.12 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522159_prometheus_high_cardinality_cpu%255B2%255D%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520159_prometheus_high_cardinality_cpu%255B2%255D%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 155.1s / πŸ’° $0.22 | [🟑 20% (1/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522159_prometheus_high_cardinality_cpu%255B2%255D%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520159_prometheus_high_cardinality_cpu%255B2%255D%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 53.2s / πŸ’° $0.19 | -| [15_failed_readiness_probe](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/15_failed_readiness_probe/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252215_failed_readiness_probe%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252015_failed_readiness_probe%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252215_failed_readiness_probe%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252015_failed_readiness_probe%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 42.9s / πŸ’° $0.13 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252215_failed_readiness_probe%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252015_failed_readiness_probe%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 45.5s / πŸ’° $0.09 | [🟑 80% (4/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252215_failed_readiness_probe%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252015_failed_readiness_probe%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 141.4s / πŸ’° $0.10 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252215_failed_readiness_probe%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252015_failed_readiness_probe%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 88.2s / πŸ’° $0.14 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252215_failed_readiness_probe%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252015_failed_readiness_probe%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 52.1s / πŸ’° $0.14 | -| [16_failed_no_toolset_found](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/16_failed_no_toolset_found/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252216_failed_no_toolset_found%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252016_failed_no_toolset_found%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄ 0% (0/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252216_failed_no_toolset_found%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252016_failed_no_toolset_found%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 46.5s / πŸ’° $0.09 | [πŸ”΄ 0% (0/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252216_failed_no_toolset_found%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252016_failed_no_toolset_found%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 38.1s / πŸ’° $0.03 | [πŸ”΄ 0% (0/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252216_failed_no_toolset_found%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252016_failed_no_toolset_found%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 64.5s / πŸ’° $0.02 | [🟑 60% (3/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252216_failed_no_toolset_found%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252016_failed_no_toolset_found%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 38.1s / πŸ’° $0.06 | [πŸ”΄ 0% (0/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252216_failed_no_toolset_found%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252016_failed_no_toolset_found%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 32.5s / πŸ’° $0.06 | -| [17_oom_kill](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/17_oom_kill/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252217_oom_kill%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252017_oom_kill%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252217_oom_kill%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252017_oom_kill%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 55.6s / πŸ’° $0.13 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252217_oom_kill%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252017_oom_kill%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 59.1s / πŸ’° $0.08 | [🟑 60% (3/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252217_oom_kill%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252017_oom_kill%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 116.0s / πŸ’° $0.08 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252217_oom_kill%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252017_oom_kill%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 71.5s / πŸ’° $0.12 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252217_oom_kill%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252017_oom_kill%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 61.3s / πŸ’° $0.12 | -| [19_detect_missing_app_details](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/19_detect_missing_app_details/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252219_detect_missing_app_details%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252019_detect_missing_app_details%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252219_detect_missing_app_details%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252019_detect_missing_app_details%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 78.8s / πŸ’° $0.44 | [🟑 80% (4/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252219_detect_missing_app_details%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252019_detect_missing_app_details%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 66.1s / πŸ’° $0.11 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252219_detect_missing_app_details%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252019_detect_missing_app_details%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 267.1s / πŸ’° $0.18 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252219_detect_missing_app_details%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252019_detect_missing_app_details%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 102.3s / πŸ’° $0.21 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252219_detect_missing_app_details%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252019_detect_missing_app_details%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 95.1s / πŸ’° $0.16 | -| [20_long_log_file_search](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/20_long_log_file_search/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252220_long_log_file_search%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252020_long_log_file_search%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252220_long_log_file_search%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252020_long_log_file_search%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 56.0s / πŸ’° $0.11 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252220_long_log_file_search%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252020_long_log_file_search%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 57.5s / πŸ’° $0.06 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252220_long_log_file_search%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252020_long_log_file_search%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 126.4s / πŸ’° $0.08 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252220_long_log_file_search%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252020_long_log_file_search%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 123.3s / πŸ’° $0.13 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252220_long_log_file_search%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252020_long_log_file_search%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 84.4s / πŸ’° $0.11 | -| [21_job_fail_curl_no_svc_account](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/21_job_fail_curl_no_svc_account/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252221_job_fail_curl_no_svc_account%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252021_job_fail_curl_no_svc_account%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟑 80% (4/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252221_job_fail_curl_no_svc_account%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252021_job_fail_curl_no_svc_account%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 51.1s / πŸ’° $0.25 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252221_job_fail_curl_no_svc_account%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252021_job_fail_curl_no_svc_account%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 79.9s / πŸ’° $0.16 | [🟑 80% (4/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252221_job_fail_curl_no_svc_account%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252021_job_fail_curl_no_svc_account%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 174.0s / πŸ’° $0.13 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252221_job_fail_curl_no_svc_account%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252021_job_fail_curl_no_svc_account%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 74.5s / πŸ’° $0.21 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252221_job_fail_curl_no_svc_account%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252021_job_fail_curl_no_svc_account%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 66.5s / πŸ’° $0.19 | -| [23_app_error_in_current_logs](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/23_app_error_in_current_logs/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252223_app_error_in_current_logs%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252023_app_error_in_current_logs%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252223_app_error_in_current_logs%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252023_app_error_in_current_logs%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 82.7s / πŸ’° $0.19 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252223_app_error_in_current_logs%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252023_app_error_in_current_logs%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 91.4s / πŸ’° $0.30 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252223_app_error_in_current_logs%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252023_app_error_in_current_logs%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 249.1s / πŸ’° $0.19 | [🟑 80% (4/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252223_app_error_in_current_logs%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252023_app_error_in_current_logs%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 78.8s / πŸ’° $0.25 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252223_app_error_in_current_logs%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252023_app_error_in_current_logs%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 76.9s / πŸ’° $0.17 | -| [24_misconfigured_pvc](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/24_misconfigured_pvc/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252224_misconfigured_pvc%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252024_misconfigured_pvc%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252224_misconfigured_pvc%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252024_misconfigured_pvc%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 60.4s / πŸ’° $0.17 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252224_misconfigured_pvc%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252024_misconfigured_pvc%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 89.5s / πŸ’° $0.13 | [πŸ”΄ 0% (0/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252224_misconfigured_pvc%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252024_misconfigured_pvc%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 58.4s / πŸ’° $0.02 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252224_misconfigured_pvc%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252024_misconfigured_pvc%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 88.4s / πŸ’° $0.16 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252224_misconfigured_pvc%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252024_misconfigured_pvc%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 112.6s / πŸ’° $0.17 | -| [24a_misconfigured_pvc_basic](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/24a_misconfigured_pvc_basic/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252224a_misconfigured_pvc_basic%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252024a_misconfigured_pvc_basic%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟑 80% (4/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252224a_misconfigured_pvc_basic%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252024a_misconfigured_pvc_basic%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 51.4s / πŸ’° $0.19 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252224a_misconfigured_pvc_basic%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252024a_misconfigured_pvc_basic%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 72.4s / πŸ’° $0.10 | [πŸ”΄ 0% (0/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252224a_misconfigured_pvc_basic%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252024a_misconfigured_pvc_basic%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 30.1s / πŸ’° $0.02 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252224a_misconfigured_pvc_basic%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252024a_misconfigured_pvc_basic%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 75.8s / πŸ’° $0.15 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252224a_misconfigured_pvc_basic%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252024a_misconfigured_pvc_basic%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 68.9s / πŸ’° $0.16 | -| [24b_misconfigured_pvc_detailed](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/24b_misconfigured_pvc_detailed/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252224b_misconfigured_pvc_detailed%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252024b_misconfigured_pvc_detailed%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄ 0% (0/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252224b_misconfigured_pvc_detailed%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252024b_misconfigured_pvc_detailed%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 55.8s / πŸ’° $0.18 | [🟑 20% (1/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252224b_misconfigured_pvc_detailed%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252024b_misconfigured_pvc_detailed%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 59.5s / πŸ’° $0.12 | [🟑 20% (1/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252224b_misconfigured_pvc_detailed%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252024b_misconfigured_pvc_detailed%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 93.6s / πŸ’° $0.07 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252224b_misconfigured_pvc_detailed%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252024b_misconfigured_pvc_detailed%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 89.7s / πŸ’° $0.17 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252224b_misconfigured_pvc_detailed%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252024b_misconfigured_pvc_detailed%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 195.9s / πŸ’° $0.17 | -| [25_misconfigured_ingress_class](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/25_misconfigured_ingress_class/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252225_misconfigured_ingress_class%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252025_misconfigured_ingress_class%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄ 0% (0/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252225_misconfigured_ingress_class%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252025_misconfigured_ingress_class%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 48.5s / πŸ’° $0.13 | [πŸ”΄ 0% (0/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252225_misconfigured_ingress_class%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252025_misconfigured_ingress_class%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 62.6s / πŸ’° $0.14 | [🟑 40% (2/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252225_misconfigured_ingress_class%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252025_misconfigured_ingress_class%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 187.9s / πŸ’° $0.10 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252225_misconfigured_ingress_class%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252025_misconfigured_ingress_class%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 121.1s / πŸ’° $0.26 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252225_misconfigured_ingress_class%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252025_misconfigured_ingress_class%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 100.2s / πŸ’° $0.35 | -| [26_page_render_times](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/26_page_render_times/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252226_page_render_times%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252026_page_render_times%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252226_page_render_times%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252026_page_render_times%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 41.5s / πŸ’° $0.14 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252226_page_render_times%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252026_page_render_times%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 42.0s / πŸ’° $0.10 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252226_page_render_times%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252026_page_render_times%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 347.0s / πŸ’° $0.26 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252226_page_render_times%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252026_page_render_times%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 73.5s / πŸ’° $0.16 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252226_page_render_times%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252026_page_render_times%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 48.4s / πŸ’° $0.16 | -| [27a_multi_container_logs](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/27a_multi_container_logs/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252227a_multi_container_logs%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252027a_multi_container_logs%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252227a_multi_container_logs%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252027a_multi_container_logs%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 44.7s / πŸ’° $0.13 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252227a_multi_container_logs%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252027a_multi_container_logs%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 53.5s / πŸ’° $0.10 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252227a_multi_container_logs%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252027a_multi_container_logs%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 197.6s / πŸ’° $0.12 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252227a_multi_container_logs%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252027a_multi_container_logs%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 75.2s / πŸ’° $0.13 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252227a_multi_container_logs%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252027a_multi_container_logs%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 47.2s / πŸ’° $0.12 | -| [27b_multi_container_logs](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/27b_multi_container_logs/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252227b_multi_container_logs%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252027b_multi_container_logs%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252227b_multi_container_logs%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252027b_multi_container_logs%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 56.3s / πŸ’° $0.14 | [🟑 80% (4/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252227b_multi_container_logs%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252027b_multi_container_logs%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 64.4s / πŸ’° $0.08 | [🟑 80% (4/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252227b_multi_container_logs%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252027b_multi_container_logs%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 124.0s / πŸ’° $0.08 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252227b_multi_container_logs%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252027b_multi_container_logs%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 55.0s / πŸ’° $0.11 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252227b_multi_container_logs%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252027b_multi_container_logs%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 63.9s / πŸ’° $0.11 | -| [28_permissions_error](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/28_permissions_error/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252228_permissions_error%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252028_permissions_error%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟑 60% (3/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252228_permissions_error%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252028_permissions_error%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 22.3s / πŸ’° $0.04 | [🟑 40% (2/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252228_permissions_error%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252028_permissions_error%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 26.9s / πŸ’° $0.05 | [🟑 40% (2/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252228_permissions_error%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252028_permissions_error%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 138.5s / πŸ’° $0.09 | [πŸ”΄ 0% (0/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252228_permissions_error%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252028_permissions_error%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 32.5s / πŸ’° $0.07 | [πŸ”΄ 0% (0/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252228_permissions_error%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252028_permissions_error%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 27.3s / πŸ’° $0.07 | -| [33_cpu_metrics_discovery](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/33_cpu_metrics_discovery/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252233_cpu_metrics_discovery%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252033_cpu_metrics_discovery%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252233_cpu_metrics_discovery%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252033_cpu_metrics_discovery%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 46.7s / πŸ’° $0.09 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252233_cpu_metrics_discovery%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252033_cpu_metrics_discovery%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 58.9s / πŸ’° $0.09 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252233_cpu_metrics_discovery%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252033_cpu_metrics_discovery%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 266.9s / πŸ’° $0.22 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252233_cpu_metrics_discovery%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252033_cpu_metrics_discovery%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 76.5s / πŸ’° $0.13 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252233_cpu_metrics_discovery%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252033_cpu_metrics_discovery%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 59.9s / πŸ’° $0.13 | -| [39_failed_toolset](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/39_failed_toolset/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252239_failed_toolset%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252039_failed_toolset%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252239_failed_toolset%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252039_failed_toolset%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 27.2s / πŸ’° $0.04 | [🟑 40% (2/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252239_failed_toolset%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252039_failed_toolset%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 40.8s / πŸ’° $0.07 | [🟑 80% (4/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252239_failed_toolset%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252039_failed_toolset%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 251.5s / πŸ’° $0.19 | [🟑 80% (4/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252239_failed_toolset%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252039_failed_toolset%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 169.5s / πŸ’° $0.09 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252239_failed_toolset%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252039_failed_toolset%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 56.9s / πŸ’° $0.11 | -| [41_setup_argo](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/41_setup_argo/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252241_setup_argo%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252041_setup_argo%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟑 80% (4/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252241_setup_argo%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252041_setup_argo%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 49.1s / πŸ’° $0.03 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252241_setup_argo%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252041_setup_argo%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 35.2s / πŸ’° $0.02 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252241_setup_argo%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252041_setup_argo%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 171.0s / πŸ’° $0.09 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252241_setup_argo%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252041_setup_argo%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 29.1s / πŸ’° $0.06 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252241_setup_argo%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252041_setup_argo%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 30.0s / πŸ’° $0.06 | -| [42_dns_issues_result_new_tools_no_runbook](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/42_dns_issues_result_new_tools_no_runbook/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252242_dns_issues_result_new_tools_no_runbook%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252042_dns_issues_result_new_tools_no_runbook%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟑 60% (3/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252242_dns_issues_result_new_tools_no_runbook%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252042_dns_issues_result_new_tools_no_runbook%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 55.0s / πŸ’° $0.22 | [🟑 60% (3/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252242_dns_issues_result_new_tools_no_runbook%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252042_dns_issues_result_new_tools_no_runbook%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 80.6s / πŸ’° $0.18 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252242_dns_issues_result_new_tools_no_runbook%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252042_dns_issues_result_new_tools_no_runbook%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 291.8s / πŸ’° $0.23 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252242_dns_issues_result_new_tools_no_runbook%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252042_dns_issues_result_new_tools_no_runbook%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 163.9s / πŸ’° $0.36 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252242_dns_issues_result_new_tools_no_runbook%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252042_dns_issues_result_new_tools_no_runbook%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 109.7s / πŸ’° $0.26 | -| [42_dns_issues_steps_new_tools](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/42_dns_issues_steps_new_tools/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252242_dns_issues_steps_new_tools%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252042_dns_issues_steps_new_tools%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252242_dns_issues_steps_new_tools%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252042_dns_issues_steps_new_tools%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 56.5s / πŸ’° $0.14 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252242_dns_issues_steps_new_tools%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252042_dns_issues_steps_new_tools%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 62.4s / πŸ’° $0.14 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252242_dns_issues_steps_new_tools%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252042_dns_issues_steps_new_tools%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 471.2s / πŸ’° $0.23 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252242_dns_issues_steps_new_tools%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252042_dns_issues_steps_new_tools%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 165.8s / πŸ’° $0.26 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252242_dns_issues_steps_new_tools%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252042_dns_issues_steps_new_tools%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 157.3s / πŸ’° $0.31 | -| [43_current_datetime_from_prompt](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/43_current_datetime_from_prompt/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252243_current_datetime_from_prompt%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252043_current_datetime_from_prompt%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252243_current_datetime_from_prompt%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252043_current_datetime_from_prompt%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 32.6s / πŸ’° $0.02 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252243_current_datetime_from_prompt%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252043_current_datetime_from_prompt%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 42.0s / πŸ’° $0.04 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252243_current_datetime_from_prompt%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252043_current_datetime_from_prompt%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 66.7s / πŸ’° $0.03 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252243_current_datetime_from_prompt%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252043_current_datetime_from_prompt%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 23.5s / πŸ’° $0.06 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252243_current_datetime_from_prompt%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252043_current_datetime_from_prompt%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 23.4s / πŸ’° $0.06 | -| [45_fetch_deployment_logs_simple](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/45_fetch_deployment_logs_simple/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252245_fetch_deployment_logs_simple%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252045_fetch_deployment_logs_simple%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252245_fetch_deployment_logs_simple%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252045_fetch_deployment_logs_simple%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 37.8s / πŸ’° $0.11 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252245_fetch_deployment_logs_simple%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252045_fetch_deployment_logs_simple%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 46.1s / πŸ’° $0.07 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252245_fetch_deployment_logs_simple%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252045_fetch_deployment_logs_simple%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 100.0s / πŸ’° $0.09 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252245_fetch_deployment_logs_simple%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252045_fetch_deployment_logs_simple%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 41.4s / πŸ’° $0.09 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252245_fetch_deployment_logs_simple%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252045_fetch_deployment_logs_simple%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 50.7s / πŸ’° $0.11 | -| [50a_logs_since_last_specific_month](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/50a_logs_since_last_specific_month/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252250a_logs_since_last_specific_month%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252050a_logs_since_last_specific_month%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252250a_logs_since_last_specific_month%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252050a_logs_since_last_specific_month%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 41.9s / πŸ’° $0.10 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252250a_logs_since_last_specific_month%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252050a_logs_since_last_specific_month%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 51.0s / πŸ’° $0.05 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252250a_logs_since_last_specific_month%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252050a_logs_since_last_specific_month%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 314.7s / πŸ’° $0.11 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252250a_logs_since_last_specific_month%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252050a_logs_since_last_specific_month%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 54.4s / πŸ’° $0.10 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252250a_logs_since_last_specific_month%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252050a_logs_since_last_specific_month%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 47.2s / πŸ’° $0.09 | -| [51_logs_summarize_errors](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/51_logs_summarize_errors/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252251_logs_summarize_errors%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252051_logs_summarize_errors%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252251_logs_summarize_errors%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252051_logs_summarize_errors%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 45.9s / πŸ’° $0.12 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252251_logs_summarize_errors%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252051_logs_summarize_errors%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 46.7s / πŸ’° $0.06 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252251_logs_summarize_errors%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252051_logs_summarize_errors%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 133.0s / πŸ’° $0.08 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252251_logs_summarize_errors%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252051_logs_summarize_errors%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 159.3s / πŸ’° $0.10 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252251_logs_summarize_errors%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252051_logs_summarize_errors%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 55.1s / πŸ’° $0.10 | -| [52_logs_login_issues](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/52_logs_login_issues/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252252_logs_login_issues%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252052_logs_login_issues%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟑 40% (2/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252252_logs_login_issues%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252052_logs_login_issues%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 84.3s / πŸ’° $0.12 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252252_logs_login_issues%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252052_logs_login_issues%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 78.5s / πŸ’° $0.38 | [🟑 60% (3/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252252_logs_login_issues%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252052_logs_login_issues%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 152.1s / πŸ’° $0.11 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252252_logs_login_issues%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252052_logs_login_issues%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 69.7s / πŸ’° $0.11 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252252_logs_login_issues%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252052_logs_login_issues%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 61.7s / πŸ’° $0.11 | -| [53_logs_find_term](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/53_logs_find_term/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252253_logs_find_term%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252053_logs_find_term%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252253_logs_find_term%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252053_logs_find_term%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 37.7s / πŸ’° $0.14 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252253_logs_find_term%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252053_logs_find_term%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 46.7s / πŸ’° $0.11 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252253_logs_find_term%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252053_logs_find_term%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 107.3s / πŸ’° $0.08 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252253_logs_find_term%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252053_logs_find_term%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 50.9s / πŸ’° $0.13 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252253_logs_find_term%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252053_logs_find_term%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 53.2s / πŸ’° $0.13 | -| [54_not_truncated_when_getting_pods](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/54_not_truncated_when_getting_pods/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252254_not_truncated_when_getting_pods%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252054_not_truncated_when_getting_pods%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252254_not_truncated_when_getting_pods%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252054_not_truncated_when_getting_pods%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 58.6s / πŸ’° $0.12 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252254_not_truncated_when_getting_pods%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252054_not_truncated_when_getting_pods%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 69.7s / πŸ’° $0.11 | [🟑 80% (4/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252254_not_truncated_when_getting_pods%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252054_not_truncated_when_getting_pods%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 196.2s / πŸ’° $0.15 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252254_not_truncated_when_getting_pods%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252054_not_truncated_when_getting_pods%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 142.2s / πŸ’° $0.15 | [🟑 80% (4/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252254_not_truncated_when_getting_pods%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252054_not_truncated_when_getting_pods%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 65.7s / πŸ’° $0.11 | -| [57_wrong_namespace](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/57_wrong_namespace/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252257_wrong_namespace%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252057_wrong_namespace%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄ 0% (0/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252257_wrong_namespace%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252057_wrong_namespace%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 40.6s / πŸ’° $0.10 | [πŸ”΄ 0% (0/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252257_wrong_namespace%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252057_wrong_namespace%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 47.2s / πŸ’° $0.06 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252257_wrong_namespace%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252057_wrong_namespace%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 145.2s / πŸ’° $0.08 | [🟑 60% (3/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252257_wrong_namespace%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252057_wrong_namespace%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 77.4s / πŸ’° $0.09 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252257_wrong_namespace%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252057_wrong_namespace%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 91.7s / πŸ’° $0.10 | -| [59_label_based_counting](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/59_label_based_counting/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252259_label_based_counting%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252059_label_based_counting%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252259_label_based_counting%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252059_label_based_counting%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 33.8s / πŸ’° $0.09 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252259_label_based_counting%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252059_label_based_counting%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 32.9s / πŸ’° $0.05 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252259_label_based_counting%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252059_label_based_counting%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 77.8s / πŸ’° $0.03 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252259_label_based_counting%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252059_label_based_counting%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 51.4s / πŸ’° $0.08 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252259_label_based_counting%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252059_label_based_counting%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 34.9s / πŸ’° $0.08 | -| [60_count_less_than](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/60_count_less_than/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252260_count_less_than%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252060_count_less_than%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252260_count_less_than%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252060_count_less_than%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 85.4s / πŸ’° $0.11 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252260_count_less_than%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252060_count_less_than%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 56.3s / πŸ’° $0.06 | [🟑 80% (4/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252260_count_less_than%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252060_count_less_than%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 88.5s / πŸ’° $0.05 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252260_count_less_than%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252060_count_less_than%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 37.1s / πŸ’° $0.08 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252260_count_less_than%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252060_count_less_than%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 36.5s / πŸ’° $0.09 | -| [61_exact_match_counting](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/61_exact_match_counting/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252261_exact_match_counting%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252061_exact_match_counting%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252261_exact_match_counting%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252061_exact_match_counting%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 33.9s / πŸ’° $0.08 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252261_exact_match_counting%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252061_exact_match_counting%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 34.0s / πŸ’° $0.05 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252261_exact_match_counting%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252061_exact_match_counting%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 60.9s / πŸ’° $0.04 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252261_exact_match_counting%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252061_exact_match_counting%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 32.2s / πŸ’° $0.07 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252261_exact_match_counting%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252061_exact_match_counting%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 36.0s / πŸ’° $0.08 | -| [62_fetch_error_logs_with_errors](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/62_fetch_error_logs_with_errors/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252262_fetch_error_logs_with_errors%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252062_fetch_error_logs_with_errors%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252262_fetch_error_logs_with_errors%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252062_fetch_error_logs_with_errors%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 60.6s / πŸ’° $0.11 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252262_fetch_error_logs_with_errors%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252062_fetch_error_logs_with_errors%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 50.9s / πŸ’° $0.07 | [🟑 80% (4/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252262_fetch_error_logs_with_errors%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252062_fetch_error_logs_with_errors%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 102.9s / πŸ’° $0.06 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252262_fetch_error_logs_with_errors%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252062_fetch_error_logs_with_errors%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 46.9s / πŸ’° $0.09 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252262_fetch_error_logs_with_errors%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252062_fetch_error_logs_with_errors%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 43.3s / πŸ’° $0.09 | -| [63_fetch_error_logs_no_errors](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/63_fetch_error_logs_no_errors/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252263_fetch_error_logs_no_errors%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252063_fetch_error_logs_no_errors%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252263_fetch_error_logs_no_errors%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252063_fetch_error_logs_no_errors%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 39.9s / πŸ’° $0.11 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252263_fetch_error_logs_no_errors%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252063_fetch_error_logs_no_errors%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 45.1s / πŸ’° $0.07 | [🟑 60% (3/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252263_fetch_error_logs_no_errors%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252063_fetch_error_logs_no_errors%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 138.8s / πŸ’° $0.11 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252263_fetch_error_logs_no_errors%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252063_fetch_error_logs_no_errors%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 46.6s / πŸ’° $0.09 | [🟑 80% (4/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252263_fetch_error_logs_no_errors%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252063_fetch_error_logs_no_errors%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 39.6s / πŸ’° $0.07 | -| [64_keda_vs_hpa_confusion](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/64_keda_vs_hpa_confusion/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252264_keda_vs_hpa_confusion%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252064_keda_vs_hpa_confusion%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄ 0% (0/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252264_keda_vs_hpa_confusion%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252064_keda_vs_hpa_confusion%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 71.8s / πŸ’° $0.42 | [πŸ”΄ 0% (0/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252264_keda_vs_hpa_confusion%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252064_keda_vs_hpa_confusion%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 51.2s / πŸ’° $0.08 | [🟑 80% (4/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252264_keda_vs_hpa_confusion%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252064_keda_vs_hpa_confusion%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 191.3s / πŸ’° $0.13 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252264_keda_vs_hpa_confusion%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252064_keda_vs_hpa_confusion%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 112.3s / πŸ’° $0.20 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252264_keda_vs_hpa_confusion%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252064_keda_vs_hpa_confusion%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 93.1s / πŸ’° $0.20 | -| [65_health_check_followup](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/65_health_check_followup/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252265_health_check_followup%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252065_health_check_followup%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252265_health_check_followup%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252065_health_check_followup%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 50.3s / πŸ’° $0.18 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252265_health_check_followup%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252065_health_check_followup%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 69.6s / πŸ’° $0.22 | [🟑 80% (4/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252265_health_check_followup%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252065_health_check_followup%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 277.0s / πŸ’° $0.20 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252265_health_check_followup%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252065_health_check_followup%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 328.5s / πŸ’° $0.24 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252265_health_check_followup%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252065_health_check_followup%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 94.4s / πŸ’° $0.27 | -| [71_connection_pool_starvation](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/71_connection_pool_starvation/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252271_connection_pool_starvation%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252071_connection_pool_starvation%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟑 80% (4/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252271_connection_pool_starvation%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252071_connection_pool_starvation%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 47.1s / πŸ’° $0.17 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252271_connection_pool_starvation%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252071_connection_pool_starvation%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 49.2s / πŸ’° $0.10 | [🟑 20% (1/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252271_connection_pool_starvation%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252071_connection_pool_starvation%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 152.5s / πŸ’° $0.13 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252271_connection_pool_starvation%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252071_connection_pool_starvation%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 59.9s / πŸ’° $0.13 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252271_connection_pool_starvation%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252071_connection_pool_starvation%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 65.3s / πŸ’° $0.17 | -| [73a_time_window_anomaly](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/73a_time_window_anomaly/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252273a_time_window_anomaly%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252073a_time_window_anomaly%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄ 0% (0/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252273a_time_window_anomaly%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252073a_time_window_anomaly%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 48.7s / πŸ’° $0.15 | [🟑 20% (1/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252273a_time_window_anomaly%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252073a_time_window_anomaly%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 58.6s / πŸ’° $0.07 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252273a_time_window_anomaly%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252073a_time_window_anomaly%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 187.9s / πŸ’° $0.13 | [πŸ”΄ 0% (0/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252273a_time_window_anomaly%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252073a_time_window_anomaly%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 84.2s / πŸ’° $0.13 | [🟑 40% (2/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252273a_time_window_anomaly%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252073a_time_window_anomaly%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 81.8s / πŸ’° $0.15 | -| [73b_time_window_anomaly](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/73b_time_window_anomaly/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252273b_time_window_anomaly%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252073b_time_window_anomaly%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟑 60% (3/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252273b_time_window_anomaly%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252073b_time_window_anomaly%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 56.0s / πŸ’° $0.16 | [🟑 40% (2/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252273b_time_window_anomaly%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252073b_time_window_anomaly%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 68.9s / πŸ’° $0.08 | [🟑 80% (4/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252273b_time_window_anomaly%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252073b_time_window_anomaly%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 165.5s / πŸ’° $0.14 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252273b_time_window_anomaly%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252073b_time_window_anomaly%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 189.3s / πŸ’° $0.14 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252273b_time_window_anomaly%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252073b_time_window_anomaly%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 67.7s / πŸ’° $0.14 | -| [76_service_discovery_issue](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/76_service_discovery_issue/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252276_service_discovery_issue%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252076_service_discovery_issue%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄ 0% (0/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252276_service_discovery_issue%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252076_service_discovery_issue%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 45.5s / πŸ’° $0.20 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252276_service_discovery_issue%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252076_service_discovery_issue%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 66.1s / πŸ’° $0.15 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252276_service_discovery_issue%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252076_service_discovery_issue%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 205.8s / πŸ’° $0.13 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252276_service_discovery_issue%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252076_service_discovery_issue%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 67.4s / πŸ’° $0.22 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252276_service_discovery_issue%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252076_service_discovery_issue%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 65.1s / πŸ’° $0.16 | -| [77_liveness_probe_misconfiguration](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/77_liveness_probe_misconfiguration/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252277_liveness_probe_misconfiguration%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252077_liveness_probe_misconfiguration%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟑 40% (2/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252277_liveness_probe_misconfiguration%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252077_liveness_probe_misconfiguration%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 42.7s / πŸ’° $0.15 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252277_liveness_probe_misconfiguration%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252077_liveness_probe_misconfiguration%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 58.6s / πŸ’° $0.08 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252277_liveness_probe_misconfiguration%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252077_liveness_probe_misconfiguration%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 182.8s / πŸ’° $0.12 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252277_liveness_probe_misconfiguration%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252077_liveness_probe_misconfiguration%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 69.0s / πŸ’° $0.13 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252277_liveness_probe_misconfiguration%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252077_liveness_probe_misconfiguration%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 54.0s / πŸ’° $0.13 | -| [78a_missing_cpu_limits](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/78a_missing_cpu_limits/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252278a_missing_cpu_limits%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252078a_missing_cpu_limits%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄ 0% (0/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252278a_missing_cpu_limits%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252078a_missing_cpu_limits%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 49.4s / πŸ’° $0.14 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252278a_missing_cpu_limits%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252078a_missing_cpu_limits%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 59.7s / πŸ’° $0.13 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252278a_missing_cpu_limits%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252078a_missing_cpu_limits%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 206.0s / πŸ’° $0.13 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252278a_missing_cpu_limits%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252078a_missing_cpu_limits%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 72.8s / πŸ’° $0.12 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252278a_missing_cpu_limits%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252078a_missing_cpu_limits%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 65.1s / πŸ’° $0.14 | -| [78b_cpu_quota_exceeded](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/78b_cpu_quota_exceeded/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252278b_cpu_quota_exceeded%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252078b_cpu_quota_exceeded%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄ 0% (0/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252278b_cpu_quota_exceeded%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252078b_cpu_quota_exceeded%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 55.1s / πŸ’° $0.18 | [🟑 20% (1/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252278b_cpu_quota_exceeded%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252078b_cpu_quota_exceeded%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 49.1s / πŸ’° $0.09 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252278b_cpu_quota_exceeded%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252078b_cpu_quota_exceeded%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 152.7s / πŸ’° $0.13 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252278b_cpu_quota_exceeded%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252078b_cpu_quota_exceeded%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 73.5s / πŸ’° $0.12 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252278b_cpu_quota_exceeded%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252078b_cpu_quota_exceeded%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 61.6s / πŸ’° $0.14 | -| [79_configmap_mount_issue](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/79_configmap_mount_issue/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252279_configmap_mount_issue%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252079_configmap_mount_issue%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252279_configmap_mount_issue%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252079_configmap_mount_issue%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 42.6s / πŸ’° $0.10 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252279_configmap_mount_issue%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252079_configmap_mount_issue%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 47.1s / πŸ’° $0.07 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252279_configmap_mount_issue%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252079_configmap_mount_issue%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 197.3s / πŸ’° $0.12 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252279_configmap_mount_issue%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252079_configmap_mount_issue%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 61.4s / πŸ’° $0.11 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252279_configmap_mount_issue%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252079_configmap_mount_issue%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 69.9s / πŸ’° $0.12 | -| [80_pvc_storage_class_mismatch](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/80_pvc_storage_class_mismatch/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252280_pvc_storage_class_mismatch%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252080_pvc_storage_class_mismatch%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄ 0% (0/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252280_pvc_storage_class_mismatch%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252080_pvc_storage_class_mismatch%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 76.1s / πŸ’° $0.12 | [πŸ”΄ 0% (0/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252280_pvc_storage_class_mismatch%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252080_pvc_storage_class_mismatch%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 64.0s / πŸ’° $0.08 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252280_pvc_storage_class_mismatch%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252080_pvc_storage_class_mismatch%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 191.5s / πŸ’° $0.13 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252280_pvc_storage_class_mismatch%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252080_pvc_storage_class_mismatch%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 89.4s / πŸ’° $0.13 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252280_pvc_storage_class_mismatch%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252080_pvc_storage_class_mismatch%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 72.9s / πŸ’° $0.14 | -| [81_service_account_permission_denied](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/81_service_account_permission_denied/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252281_service_account_permission_denied%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252081_service_account_permission_denied%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟑 20% (1/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252281_service_account_permission_denied%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252081_service_account_permission_denied%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 47.2s / πŸ’° $0.14 | [🟑 80% (4/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252281_service_account_permission_denied%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252081_service_account_permission_denied%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 56.2s / πŸ’° $0.11 | [🟑 80% (4/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252281_service_account_permission_denied%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252081_service_account_permission_denied%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 198.0s / πŸ’° $0.15 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252281_service_account_permission_denied%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252081_service_account_permission_denied%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 103.5s / πŸ’° $0.21 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252281_service_account_permission_denied%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252081_service_account_permission_denied%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 73.0s / πŸ’° $0.17 | -| [82_pod_anti_affinity_conflict](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/82_pod_anti_affinity_conflict/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252282_pod_anti_affinity_conflict%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252082_pod_anti_affinity_conflict%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252282_pod_anti_affinity_conflict%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252082_pod_anti_affinity_conflict%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 55.6s / πŸ’° $0.13 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252282_pod_anti_affinity_conflict%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252082_pod_anti_affinity_conflict%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 61.8s / πŸ’° $0.08 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252282_pod_anti_affinity_conflict%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252082_pod_anti_affinity_conflict%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 173.8s / πŸ’° $0.13 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252282_pod_anti_affinity_conflict%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252082_pod_anti_affinity_conflict%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 77.3s / πŸ’° $0.14 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252282_pod_anti_affinity_conflict%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252082_pod_anti_affinity_conflict%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 108.8s / πŸ’° $0.14 | -| [83_secret_not_found](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/83_secret_not_found/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252283_secret_not_found%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252083_secret_not_found%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252283_secret_not_found%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252083_secret_not_found%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 44.7s / πŸ’° $0.15 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252283_secret_not_found%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252083_secret_not_found%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 44.8s / πŸ’° $0.08 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252283_secret_not_found%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252083_secret_not_found%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 185.9s / πŸ’° $0.13 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252283_secret_not_found%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252083_secret_not_found%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 81.7s / πŸ’° $0.11 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252283_secret_not_found%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252083_secret_not_found%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 85.1s / πŸ’° $0.12 | -| [84_network_policy_blocking_traffic](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/84_network_policy_blocking_traffic/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252284_network_policy_blocking_traffic%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252084_network_policy_blocking_traffic%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟑 20% (1/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252284_network_policy_blocking_traffic%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252084_network_policy_blocking_traffic%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 47.4s / πŸ’° $0.18 | [🟑 80% (4/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252284_network_policy_blocking_traffic%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252084_network_policy_blocking_traffic%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 58.1s / πŸ’° $0.14 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252284_network_policy_blocking_traffic%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252084_network_policy_blocking_traffic%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 226.9s / πŸ’° $0.14 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252284_network_policy_blocking_traffic%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252084_network_policy_blocking_traffic%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 131.7s / πŸ’° $0.24 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252284_network_policy_blocking_traffic%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252084_network_policy_blocking_traffic%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 85.7s / πŸ’° $0.23 | -| [85_hpa_not_scaling](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/85_hpa_not_scaling/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252285_hpa_not_scaling%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252085_hpa_not_scaling%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄ 0% (0/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252285_hpa_not_scaling%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252085_hpa_not_scaling%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 42.0s / πŸ’° $0.11 | [🟑 80% (4/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252285_hpa_not_scaling%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252085_hpa_not_scaling%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 60.3s / πŸ’° $0.12 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252285_hpa_not_scaling%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252085_hpa_not_scaling%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 183.9s / πŸ’° $0.13 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252285_hpa_not_scaling%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252085_hpa_not_scaling%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 67.2s / πŸ’° $0.16 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252285_hpa_not_scaling%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252085_hpa_not_scaling%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 68.2s / πŸ’° $0.17 | -| [86_configmap_like_but_secret](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/86_configmap_like_but_secret/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252286_configmap_like_but_secret%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252086_configmap_like_but_secret%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟑 80% (4/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252286_configmap_like_but_secret%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252086_configmap_like_but_secret%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 50.8s / πŸ’° $0.18 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252286_configmap_like_but_secret%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252086_configmap_like_but_secret%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 58.3s / πŸ’° $0.10 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252286_configmap_like_but_secret%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252086_configmap_like_but_secret%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 227.5s / πŸ’° $0.17 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252286_configmap_like_but_secret%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252086_configmap_like_but_secret%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 76.0s / πŸ’° $0.13 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252286_configmap_like_but_secret%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252086_configmap_like_but_secret%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 158.1s / πŸ’° $0.15 | -| [89_runbook_missing_cloudwatch](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/89_runbook_missing_cloudwatch/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252289_runbook_missing_cloudwatch%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252089_runbook_missing_cloudwatch%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟑 80% (4/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252289_runbook_missing_cloudwatch%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252089_runbook_missing_cloudwatch%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 44.0s / πŸ’° $0.07 | [🟑 80% (4/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252289_runbook_missing_cloudwatch%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252089_runbook_missing_cloudwatch%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 31.9s / πŸ’° $0.04 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252289_runbook_missing_cloudwatch%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252089_runbook_missing_cloudwatch%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 258.4s / πŸ’° $0.15 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252289_runbook_missing_cloudwatch%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252089_runbook_missing_cloudwatch%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 55.3s / πŸ’° $0.11 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252289_runbook_missing_cloudwatch%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252089_runbook_missing_cloudwatch%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 47.5s / πŸ’° $0.11 | -| [90_runbook_basic_selection](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/90_runbook_basic_selection/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252290_runbook_basic_selection%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252090_runbook_basic_selection%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252290_runbook_basic_selection%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252090_runbook_basic_selection%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 58.0s / πŸ’° $0.20 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252290_runbook_basic_selection%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252090_runbook_basic_selection%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 71.2s / πŸ’° $0.16 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252290_runbook_basic_selection%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252090_runbook_basic_selection%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 365.1s / πŸ’° $0.29 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252290_runbook_basic_selection%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252090_runbook_basic_selection%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 216.9s / πŸ’° $0.49 | [🟑 80% (4/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252290_runbook_basic_selection%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252090_runbook_basic_selection%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 138.6s / πŸ’° $0.47 | -| [91f_datadog_logs_historical_pod](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/91f_datadog_logs_historical_pod/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252291f_datadog_logs_historical_pod%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252091f_datadog_logs_historical_pod%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄ 0% (0/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252291f_datadog_logs_historical_pod%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252091f_datadog_logs_historical_pod%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 46.0s / πŸ’° $0.16 | [🟑 20% (1/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252291f_datadog_logs_historical_pod%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252091f_datadog_logs_historical_pod%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 64.1s / πŸ’° $0.14 | [🟑 80% (4/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252291f_datadog_logs_historical_pod%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252091f_datadog_logs_historical_pod%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 302.2s / πŸ’° $0.19 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252291f_datadog_logs_historical_pod%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252091f_datadog_logs_historical_pod%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 74.8s / πŸ’° $0.15 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252291f_datadog_logs_historical_pod%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252091f_datadog_logs_historical_pod%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 67.1s / πŸ’° $0.14 | -| [93_calling_datadog[0]](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/93_calling_datadog[0]/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252293_calling_datadog%255B0%255D%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252093_calling_datadog%255B0%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252293_calling_datadog%255B0%255D%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252093_calling_datadog%255B0%255D%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 61.2s / πŸ’° $0.12 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252293_calling_datadog%255B0%255D%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252093_calling_datadog%255B0%255D%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 15.6s / πŸ’° $0.07 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252293_calling_datadog%255B0%255D%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252093_calling_datadog%255B0%255D%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 54.2s / πŸ’° $0.09 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252293_calling_datadog%255B0%255D%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252093_calling_datadog%255B0%255D%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 13.7s / πŸ’° $0.15 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252293_calling_datadog%255B0%255D%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252093_calling_datadog%255B0%255D%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 12.5s / πŸ’° $0.15 | -| [93_calling_datadog[1]](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/93_calling_datadog[1]/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252293_calling_datadog%255B1%255D%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252093_calling_datadog%255B1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252293_calling_datadog%255B1%255D%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252093_calling_datadog%255B1%255D%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 73.2s / πŸ’° $0.12 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252293_calling_datadog%255B1%255D%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252093_calling_datadog%255B1%255D%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 12.9s / πŸ’° $0.07 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252293_calling_datadog%255B1%255D%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252093_calling_datadog%255B1%255D%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 63.4s / πŸ’° $0.08 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252293_calling_datadog%255B1%255D%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252093_calling_datadog%255B1%255D%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 20.4s / πŸ’° $0.15 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252293_calling_datadog%255B1%255D%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252093_calling_datadog%255B1%255D%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 11.8s / πŸ’° $0.15 | -| [94_runbook_transparency](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/94_runbook_transparency/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252294_runbook_transparency%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252094_runbook_transparency%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252294_runbook_transparency%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252094_runbook_transparency%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 60.9s / πŸ’° $0.25 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252294_runbook_transparency%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252094_runbook_transparency%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 85.7s / πŸ’° $0.20 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252294_runbook_transparency%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252094_runbook_transparency%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 309.8s / πŸ’° $0.25 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252294_runbook_transparency%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252094_runbook_transparency%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 116.3s / πŸ’° $0.23 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252294_runbook_transparency%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252094_runbook_transparency%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 94.6s / πŸ’° $0.24 | -| [96_no_matching_runbook](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/96_no_matching_runbook/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252296_no_matching_runbook%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252096_no_matching_runbook%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄ 0% (0/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252296_no_matching_runbook%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252096_no_matching_runbook%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 56.5s / πŸ’° $0.22 | [πŸ”΄ 0% (0/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252296_no_matching_runbook%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252096_no_matching_runbook%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 128.6s / πŸ’° $0.55 | [🟑 60% (3/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252296_no_matching_runbook%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252096_no_matching_runbook%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 304.2s / πŸ’° $0.20 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252296_no_matching_runbook%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252096_no_matching_runbook%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 203.2s / πŸ’° $0.57 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252296_no_matching_runbook%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252096_no_matching_runbook%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 119.7s / πŸ’° $0.27 | -| [97_logs_clarification_needed](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/97_logs_clarification_needed/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252297_logs_clarification_needed%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252097_logs_clarification_needed%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252297_logs_clarification_needed%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252097_logs_clarification_needed%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 18.7s / πŸ’° $0.03 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252297_logs_clarification_needed%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252097_logs_clarification_needed%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 30.9s / πŸ’° $0.03 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252297_logs_clarification_needed%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252097_logs_clarification_needed%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 32.2s / πŸ’° $0.02 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252297_logs_clarification_needed%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252097_logs_clarification_needed%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 95.0s / πŸ’° $0.19 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252297_logs_clarification_needed%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252097_logs_clarification_needed%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 21.8s / πŸ’° $0.06 | -| [99_logs_transparency_custom_time](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/99_logs_transparency_custom_time/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252299_logs_transparency_custom_time%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252099_logs_transparency_custom_time%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252299_logs_transparency_custom_time%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252099_logs_transparency_custom_time%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 38.0s / πŸ’° $0.12 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252299_logs_transparency_custom_time%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252099_logs_transparency_custom_time%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 46.2s / πŸ’° $0.09 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252299_logs_transparency_custom_time%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252099_logs_transparency_custom_time%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 99.7s / πŸ’° $0.07 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252299_logs_transparency_custom_time%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252099_logs_transparency_custom_time%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 89.6s / πŸ’° $0.11 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252299_logs_transparency_custom_time%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252099_logs_transparency_custom_time%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 95.6s / πŸ’° $0.11 | -| [50_logs_since_specific_date](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/50_logs_since_specific_date/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252250_logs_since_specific_date%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252050_logs_since_specific_date%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252250_logs_since_specific_date%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252050_logs_since_specific_date%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 20.3s / πŸ’° $0.10 | [🟒 100% (4/4)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252250_logs_since_specific_date%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252050_logs_since_specific_date%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 25.4s / πŸ’° $0.06 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252250_logs_since_specific_date%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252050_logs_since_specific_date%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 105.6s / πŸ’° $0.09 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252250_logs_since_specific_date%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252050_logs_since_specific_date%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 35.3s / πŸ’° $0.12 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252250_logs_since_specific_date%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252050_logs_since_specific_date%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 28.8s / πŸ’° $0.10 | -| [93_calling_datadog[2]](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/93_calling_datadog[2]/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252293_calling_datadog%255B2%255D%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252093_calling_datadog%255B2%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252293_calling_datadog%255B2%255D%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252093_calling_datadog%255B2%255D%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 57.6s / πŸ’° $0.12 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252293_calling_datadog%255B2%255D%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252093_calling_datadog%255B2%255D%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 15.2s / πŸ’° $0.08 | [🟒 100% (4/4)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252293_calling_datadog%255B2%255D%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252093_calling_datadog%255B2%255D%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 72.4s / πŸ’° $0.09 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252293_calling_datadog%255B2%255D%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252093_calling_datadog%255B2%255D%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 13.1s / πŸ’° $0.15 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252293_calling_datadog%255B2%255D%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252093_calling_datadog%255B2%255D%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 11.9s / πŸ’° $0.15 | -| [93_events_since_specific_date](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/93_events_since_specific_date/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252293_events_since_specific_date%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252093_events_since_specific_date%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (4/4)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252293_events_since_specific_date%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252093_events_since_specific_date%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 20.2s / πŸ’° $0.10 | [🟒 100% (4/4)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252293_events_since_specific_date%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252093_events_since_specific_date%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 19.0s / πŸ’° $0.06 | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252293_events_since_specific_date%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252093_events_since_specific_date%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252293_events_since_specific_date%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252093_events_since_specific_date%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 24.3s / πŸ’° $0.10 | [🟒 100% (5/5)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252293_events_since_specific_date%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252093_events_since_specific_date%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 21.0s / πŸ’° $0.10 | -| [44_slack_statefulset_logs](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/44_slack_statefulset_logs/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252244_slack_statefulset_logs%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252044_slack_statefulset_logs%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252244_slack_statefulset_logs%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252044_slack_statefulset_logs%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252244_slack_statefulset_logs%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252044_slack_statefulset_logs%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252244_slack_statefulset_logs%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252044_slack_statefulset_logs%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252244_slack_statefulset_logs%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252044_slack_statefulset_logs%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252244_slack_statefulset_logs%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252044_slack_statefulset_logs%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | -| [48_logs_since_thursday](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/48_logs_since_thursday/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252248_logs_since_thursday%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252048_logs_since_thursday%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252248_logs_since_thursday%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252048_logs_since_thursday%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252248_logs_since_thursday%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252048_logs_since_thursday%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252248_logs_since_thursday%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252048_logs_since_thursday%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252248_logs_since_thursday%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252048_logs_since_thursday%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252248_logs_since_thursday%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252048_logs_since_thursday%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | -| [22_high_latency_dbi_down](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/22_high_latency_dbi_down/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252222_high_latency_dbi_down%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252022_high_latency_dbi_down%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252222_high_latency_dbi_down%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252022_high_latency_dbi_down%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252222_high_latency_dbi_down%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252022_high_latency_dbi_down%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252222_high_latency_dbi_down%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252022_high_latency_dbi_down%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252222_high_latency_dbi_down%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252022_high_latency_dbi_down%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252222_high_latency_dbi_down%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252022_high_latency_dbi_down%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | -| [08_sock_shop_frontend](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/08_sock_shop_frontend/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252208_sock_shop_frontend%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252008_sock_shop_frontend%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252208_sock_shop_frontend%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252008_sock_shop_frontend%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252208_sock_shop_frontend%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252008_sock_shop_frontend%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252208_sock_shop_frontend%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252008_sock_shop_frontend%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252208_sock_shop_frontend%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252008_sock_shop_frontend%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252208_sock_shop_frontend%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252008_sock_shop_frontend%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | -| [104b_postgres_missing_index_pgstat](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/104b_postgres_missing_index_pgstat/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522104b_postgres_missing_index_pgstat%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520104b_postgres_missing_index_pgstat%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522104b_postgres_missing_index_pgstat%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520104b_postgres_missing_index_pgstat%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522104b_postgres_missing_index_pgstat%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520104b_postgres_missing_index_pgstat%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522104b_postgres_missing_index_pgstat%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520104b_postgres_missing_index_pgstat%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522104b_postgres_missing_index_pgstat%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520104b_postgres_missing_index_pgstat%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522104b_postgres_missing_index_pgstat%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520104b_postgres_missing_index_pgstat%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | -| [104c_postgres_minimal_missing_index](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/104c_postgres_minimal_missing_index/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522104c_postgres_minimal_missing_index%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520104c_postgres_minimal_missing_index%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522104c_postgres_minimal_missing_index%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520104c_postgres_minimal_missing_index%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522104c_postgres_minimal_missing_index%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520104c_postgres_minimal_missing_index%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522104c_postgres_minimal_missing_index%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520104c_postgres_minimal_missing_index%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522104c_postgres_minimal_missing_index%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520104c_postgres_minimal_missing_index%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522104c_postgres_minimal_missing_index%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520104c_postgres_minimal_missing_index%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | -| [105_redis_wrong_data_structure](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/105_redis_wrong_data_structure/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522105_redis_wrong_data_structure%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520105_redis_wrong_data_structure%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522105_redis_wrong_data_structure%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520105_redis_wrong_data_structure%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522105_redis_wrong_data_structure%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520105_redis_wrong_data_structure%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522105_redis_wrong_data_structure%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520105_redis_wrong_data_structure%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522105_redis_wrong_data_structure%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520105_redis_wrong_data_structure%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522105_redis_wrong_data_structure%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520105_redis_wrong_data_structure%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | -| [156_kafka_opensearch_latency](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/156_kafka_opensearch_latency/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522156_kafka_opensearch_latency%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520156_kafka_opensearch_latency%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522156_kafka_opensearch_latency%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520156_kafka_opensearch_latency%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522156_kafka_opensearch_latency%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520156_kafka_opensearch_latency%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522156_kafka_opensearch_latency%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520156_kafka_opensearch_latency%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522156_kafka_opensearch_latency%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520156_kafka_opensearch_latency%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522156_kafka_opensearch_latency%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520156_kafka_opensearch_latency%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | -| [43_slack_deployment_logs](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/43_slack_deployment_logs/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252243_slack_deployment_logs%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252043_slack_deployment_logs%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252243_slack_deployment_logs%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252043_slack_deployment_logs%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252243_slack_deployment_logs%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252043_slack_deployment_logs%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252243_slack_deployment_logs%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252043_slack_deployment_logs%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252243_slack_deployment_logs%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252043_slack_deployment_logs%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252243_slack_deployment_logs%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252043_slack_deployment_logs%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | -| [55_kafka_runbook](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/55_kafka_runbook/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252255_kafka_runbook%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252055_kafka_runbook%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252255_kafka_runbook%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252055_kafka_runbook%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252255_kafka_runbook%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252055_kafka_runbook%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252255_kafka_runbook%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252055_kafka_runbook%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252255_kafka_runbook%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252055_kafka_runbook%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252255_kafka_runbook%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252055_kafka_runbook%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | -| [98_logs_transparency_default_time](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/98_logs_transparency_default_time/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252298_logs_transparency_default_time%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252098_logs_transparency_default_time%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252298_logs_transparency_default_time%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252098_logs_transparency_default_time%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252298_logs_transparency_default_time%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252098_logs_transparency_default_time%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252298_logs_transparency_default_time%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252098_logs_transparency_default_time%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252298_logs_transparency_default_time%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252098_logs_transparency_default_time%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252298_logs_transparency_default_time%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252098_logs_transparency_default_time%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [01_how_many_pods](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/01_how_many_pods/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252201_how_many_pods%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252001_how_many_pods%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252201_how_many_pods%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252001_how_many_pods%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 23.0s / πŸ’° $0.08 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252201_how_many_pods%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252001_how_many_pods%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 36.7s / πŸ’° $0.06 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252201_how_many_pods%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252001_how_many_pods%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 45.8s / πŸ’° $0.03 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252201_how_many_pods%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252001_how_many_pods%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 22.9s / πŸ’° $0.07 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252201_how_many_pods%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252001_how_many_pods%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 25.3s / πŸ’° $0.07 | +| [02_what_is_wrong_with_pod](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/02_what_is_wrong_with_pod/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252202_what_is_wrong_with_pod%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252002_what_is_wrong_with_pod%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252202_what_is_wrong_with_pod%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252002_what_is_wrong_with_pod%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 35.0s / πŸ’° $0.08 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252202_what_is_wrong_with_pod%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252002_what_is_wrong_with_pod%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 30.6s / πŸ’° $0.06 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252202_what_is_wrong_with_pod%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252002_what_is_wrong_with_pod%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 97.4s / πŸ’° $0.08 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252202_what_is_wrong_with_pod%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252002_what_is_wrong_with_pod%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 40.8s / πŸ’° $0.10 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252202_what_is_wrong_with_pod%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252002_what_is_wrong_with_pod%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 74.7s / πŸ’° $0.20 | +| [03_what_is_the_command_to_port_forward](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/03_what_is_the_command_to_port_forward/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252203_what_is_the_command_to_port_forward%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252003_what_is_the_command_to_port_forward%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252203_what_is_the_command_to_port_forward%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252003_what_is_the_command_to_port_forward%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 37.5s / πŸ’° $0.15 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252203_what_is_the_command_to_port_forward%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252003_what_is_the_command_to_port_forward%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 33.4s / πŸ’° $0.15 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252203_what_is_the_command_to_port_forward%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252003_what_is_the_command_to_port_forward%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 91.2s / πŸ’° $0.06 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252203_what_is_the_command_to_port_forward%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252003_what_is_the_command_to_port_forward%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 33.1s / πŸ’° $0.11 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252203_what_is_the_command_to_port_forward%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252003_what_is_the_command_to_port_forward%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 33.1s / πŸ’° $0.09 | +| [04_related_k8s_events](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/04_related_k8s_events/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252204_related_k8s_events%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252004_related_k8s_events%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252204_related_k8s_events%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252004_related_k8s_events%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 25.4s / πŸ’° $0.10 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252204_related_k8s_events%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252004_related_k8s_events%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 28.3s / πŸ’° $0.06 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252204_related_k8s_events%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252004_related_k8s_events%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 63.5s / πŸ’° $0.04 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252204_related_k8s_events%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252004_related_k8s_events%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 33.0s / πŸ’° $0.08 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252204_related_k8s_events%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252004_related_k8s_events%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 32.5s / πŸ’° $0.08 | +| [05_image_version](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/05_image_version/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252205_image_version%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252005_image_version%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252205_image_version%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252005_image_version%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 32.6s / πŸ’° $0.12 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252205_image_version%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252005_image_version%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 22.1s / πŸ’° $0.04 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252205_image_version%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252005_image_version%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 46.8s / πŸ’° $0.05 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252205_image_version%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252005_image_version%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 31.2s / πŸ’° $0.08 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252205_image_version%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252005_image_version%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 33.7s / πŸ’° $0.09 | +| [08_sock_shop_frontend](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/08_sock_shop_frontend/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252208_sock_shop_frontend%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252008_sock_shop_frontend%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252208_sock_shop_frontend%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252008_sock_shop_frontend%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252208_sock_shop_frontend%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252008_sock_shop_frontend%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252208_sock_shop_frontend%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252008_sock_shop_frontend%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252208_sock_shop_frontend%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252008_sock_shop_frontend%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252208_sock_shop_frontend%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252008_sock_shop_frontend%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [09_crashpod](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/09_crashpod/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252209_crashpod%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252009_crashpod%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252209_crashpod%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252009_crashpod%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 37.2s / πŸ’° $0.11 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252209_crashpod%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252009_crashpod%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 31.1s / πŸ’° $0.06 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252209_crashpod%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252009_crashpod%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 125.4s / πŸ’° $0.07 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252209_crashpod%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252009_crashpod%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 52.2s / πŸ’° $0.13 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252209_crashpod%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252009_crashpod%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 52.4s / πŸ’° $0.13 | +| [100a_historical_logs](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/100a_historical_logs/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522100a_historical_logs%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520100a_historical_logs%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522100a_historical_logs%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520100a_historical_logs%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 50.7s / πŸ’° $0.11 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522100a_historical_logs%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520100a_historical_logs%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 42.4s / πŸ’° $0.08 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522100a_historical_logs%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520100a_historical_logs%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 338.5s / πŸ’° $0.18 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522100a_historical_logs%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520100a_historical_logs%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 50.4s / πŸ’° $0.12 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522100a_historical_logs%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520100a_historical_logs%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 208.6s / πŸ’° $0.28 | +| [100b_historical_logs_nonstandard_label](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/100b_historical_logs_nonstandard_label/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522100b_historical_logs_nonstandard_label%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520100b_historical_logs_nonstandard_label%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522100b_historical_logs_nonstandard_label%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520100b_historical_logs_nonstandard_label%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 45.3s / πŸ’° $0.11 | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522100b_historical_logs_nonstandard_label%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520100b_historical_logs_nonstandard_label%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 37.9s / πŸ’° $0.06 | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522100b_historical_logs_nonstandard_label%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520100b_historical_logs_nonstandard_label%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 393.7s / πŸ’° $0.31 | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522100b_historical_logs_nonstandard_label%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520100b_historical_logs_nonstandard_label%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 124.7s / πŸ’° $0.15 | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522100b_historical_logs_nonstandard_label%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520100b_historical_logs_nonstandard_label%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 143.2s / πŸ’° $0.26 | +| [101_historical_logs_pod_deleted](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/101_historical_logs_pod_deleted/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522101_historical_logs_pod_deleted%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520101_historical_logs_pod_deleted%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522101_historical_logs_pod_deleted%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520101_historical_logs_pod_deleted%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 34.4s / πŸ’° $0.12 | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522101_historical_logs_pod_deleted%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520101_historical_logs_pod_deleted%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 26.2s / πŸ’° $0.05 | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522101_historical_logs_pod_deleted%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520101_historical_logs_pod_deleted%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 435.8s / πŸ’° $0.30 | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522101_historical_logs_pod_deleted%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520101_historical_logs_pod_deleted%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 103.1s / πŸ’° $0.20 | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522101_historical_logs_pod_deleted%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520101_historical_logs_pod_deleted%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 69.5s / πŸ’° $0.13 | +| [103_logs_transparency_default_limit](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/103_logs_transparency_default_limit/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522103_logs_transparency_default_limit%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520103_logs_transparency_default_limit%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522103_logs_transparency_default_limit%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520103_logs_transparency_default_limit%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 29.4s / πŸ’° $0.18 | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522103_logs_transparency_default_limit%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520103_logs_transparency_default_limit%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 42.1s / πŸ’° $0.43 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522103_logs_transparency_default_limit%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520103_logs_transparency_default_limit%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 131.9s / πŸ’° $0.09 | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522103_logs_transparency_default_limit%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520103_logs_transparency_default_limit%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 45.1s / πŸ’° $0.40 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522103_logs_transparency_default_limit%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520103_logs_transparency_default_limit%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 46.4s / πŸ’° $0.22 | +| [104a_postgres_root_issue](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/104a_postgres_root_issue/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522104a_postgres_root_issue%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520104a_postgres_root_issue%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522104a_postgres_root_issue%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520104a_postgres_root_issue%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 41.6s / πŸ’° $0.17 | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522104a_postgres_root_issue%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520104a_postgres_root_issue%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 41.3s / πŸ’° $0.13 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522104a_postgres_root_issue%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520104a_postgres_root_issue%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 273.0s / πŸ’° $0.22 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522104a_postgres_root_issue%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520104a_postgres_root_issue%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 66.8s / πŸ’° $0.21 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522104a_postgres_root_issue%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520104a_postgres_root_issue%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 82.2s / πŸ’° $0.23 | +| [104b_postgres_missing_index_pgstat](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/104b_postgres_missing_index_pgstat/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522104b_postgres_missing_index_pgstat%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520104b_postgres_missing_index_pgstat%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522104b_postgres_missing_index_pgstat%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520104b_postgres_missing_index_pgstat%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522104b_postgres_missing_index_pgstat%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520104b_postgres_missing_index_pgstat%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522104b_postgres_missing_index_pgstat%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520104b_postgres_missing_index_pgstat%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522104b_postgres_missing_index_pgstat%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520104b_postgres_missing_index_pgstat%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522104b_postgres_missing_index_pgstat%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520104b_postgres_missing_index_pgstat%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [104c_postgres_minimal_missing_index](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/104c_postgres_minimal_missing_index/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522104c_postgres_minimal_missing_index%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520104c_postgres_minimal_missing_index%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522104c_postgres_minimal_missing_index%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520104c_postgres_minimal_missing_index%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522104c_postgres_minimal_missing_index%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520104c_postgres_minimal_missing_index%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522104c_postgres_minimal_missing_index%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520104c_postgres_minimal_missing_index%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522104c_postgres_minimal_missing_index%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520104c_postgres_minimal_missing_index%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522104c_postgres_minimal_missing_index%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520104c_postgres_minimal_missing_index%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [105_redis_wrong_data_structure](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/105_redis_wrong_data_structure/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522105_redis_wrong_data_structure%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520105_redis_wrong_data_structure%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522105_redis_wrong_data_structure%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520105_redis_wrong_data_structure%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522105_redis_wrong_data_structure%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520105_redis_wrong_data_structure%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522105_redis_wrong_data_structure%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520105_redis_wrong_data_structure%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522105_redis_wrong_data_structure%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520105_redis_wrong_data_structure%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522105_redis_wrong_data_structure%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520105_redis_wrong_data_structure%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [107_log_filter_http_status_code](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/107_log_filter_http_status_code/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522107_log_filter_http_status_code%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520107_log_filter_http_status_code%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522107_log_filter_http_status_code%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520107_log_filter_http_status_code%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 46.6s / πŸ’° $0.20 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522107_log_filter_http_status_code%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520107_log_filter_http_status_code%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 56.7s / πŸ’° $0.12 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522107_log_filter_http_status_code%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520107_log_filter_http_status_code%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 881.1s / πŸ’° $0.37 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522107_log_filter_http_status_code%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520107_log_filter_http_status_code%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 72.0s / πŸ’° $0.22 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522107_log_filter_http_status_code%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520107_log_filter_http_status_code%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 95.4s / πŸ’° $0.34 | +| [108_logs_nearby_lines](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/108_logs_nearby_lines/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522108_logs_nearby_lines%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520108_logs_nearby_lines%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522108_logs_nearby_lines%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520108_logs_nearby_lines%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 57.6s / πŸ’° $0.22 | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522108_logs_nearby_lines%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520108_logs_nearby_lines%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 45.9s / πŸ’° $0.17 | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522108_logs_nearby_lines%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520108_logs_nearby_lines%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 227.1s / πŸ’° $0.21 | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522108_logs_nearby_lines%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520108_logs_nearby_lines%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 77.0s / πŸ’° $0.37 | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522108_logs_nearby_lines%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520108_logs_nearby_lines%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 96.7s / πŸ’° $0.23 | +| [109_logs_transparency_not_found](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/109_logs_transparency_not_found/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522109_logs_transparency_not_found%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520109_logs_transparency_not_found%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522109_logs_transparency_not_found%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520109_logs_transparency_not_found%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 28.5s / πŸ’° $0.10 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522109_logs_transparency_not_found%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520109_logs_transparency_not_found%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 29.7s / πŸ’° $0.08 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522109_logs_transparency_not_found%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520109_logs_transparency_not_found%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 100.1s / πŸ’° $0.07 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522109_logs_transparency_not_found%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520109_logs_transparency_not_found%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 46.5s / πŸ’° $0.10 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522109_logs_transparency_not_found%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520109_logs_transparency_not_found%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 38.0s / πŸ’° $0.10 | +| [10_image_pull_backoff](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/10_image_pull_backoff/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252210_image_pull_backoff%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252010_image_pull_backoff%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252210_image_pull_backoff%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252010_image_pull_backoff%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 46.2s / πŸ’° $0.19 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252210_image_pull_backoff%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252010_image_pull_backoff%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 40.8s / πŸ’° $0.08 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252210_image_pull_backoff%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252010_image_pull_backoff%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 153.7s / πŸ’° $0.18 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252210_image_pull_backoff%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252010_image_pull_backoff%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 48.9s / πŸ’° $0.15 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252210_image_pull_backoff%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252010_image_pull_backoff%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 46.3s / πŸ’° $0.12 | +| [110_k8s_events_image_pull](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/110_k8s_events_image_pull/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522110_k8s_events_image_pull%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520110_k8s_events_image_pull%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522110_k8s_events_image_pull%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520110_k8s_events_image_pull%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 29.6s / πŸ’° $0.09 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522110_k8s_events_image_pull%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520110_k8s_events_image_pull%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 28.1s / πŸ’° $0.06 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522110_k8s_events_image_pull%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520110_k8s_events_image_pull%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 76.4s / πŸ’° $0.06 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522110_k8s_events_image_pull%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520110_k8s_events_image_pull%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 35.3s / πŸ’° $0.09 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522110_k8s_events_image_pull%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520110_k8s_events_image_pull%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 79.7s / πŸ’° $0.15 | +| [111_disabled_datadog_traces](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/111_disabled_datadog_traces/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522111_disabled_datadog_traces%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520111_disabled_datadog_traces%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522111_disabled_datadog_traces%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520111_disabled_datadog_traces%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 15.0s / πŸ’° $0.03 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522111_disabled_datadog_traces%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520111_disabled_datadog_traces%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 26.8s / πŸ’° $0.06 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522111_disabled_datadog_traces%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520111_disabled_datadog_traces%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 356.5s / πŸ’° $0.28 | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522111_disabled_datadog_traces%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520111_disabled_datadog_traces%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 100.3s / πŸ’° $0.17 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522111_disabled_datadog_traces%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520111_disabled_datadog_traces%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 22.1s / πŸ’° $0.06 | +| [111_pod_names_contain_service](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/111_pod_names_contain_service/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522111_pod_names_contain_service%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520111_pod_names_contain_service%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522111_pod_names_contain_service%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520111_pod_names_contain_service%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 40.5s / πŸ’° $0.20 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522111_pod_names_contain_service%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520111_pod_names_contain_service%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 56.9s / πŸ’° $0.21 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522111_pod_names_contain_service%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520111_pod_names_contain_service%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 237.4s / πŸ’° $0.16 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522111_pod_names_contain_service%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520111_pod_names_contain_service%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 65.2s / πŸ’° $0.22 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522111_pod_names_contain_service%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520111_pod_names_contain_service%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 73.4s / πŸ’° $0.67 | +| [112_find_pvcs_by_uuid](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/112_find_pvcs_by_uuid/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522112_find_pvcs_by_uuid%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520112_find_pvcs_by_uuid%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522112_find_pvcs_by_uuid%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520112_find_pvcs_by_uuid%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 23.6s / πŸ’° $0.08 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522112_find_pvcs_by_uuid%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520112_find_pvcs_by_uuid%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 33.5s / πŸ’° $0.12 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522112_find_pvcs_by_uuid%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520112_find_pvcs_by_uuid%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 159.1s / πŸ’° $0.11 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522112_find_pvcs_by_uuid%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520112_find_pvcs_by_uuid%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 37.4s / πŸ’° $0.11 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522112_find_pvcs_by_uuid%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520112_find_pvcs_by_uuid%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 35.0s / πŸ’° $0.11 | +| [114_checkout_latency_tracing_rebuild[0]](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/114_checkout_latency_tracing_rebuild[0]/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522114_checkout_latency_tracing_rebuild%255B0%255D%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520114_checkout_latency_tracing_rebuild%255B0%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522114_checkout_latency_tracing_rebuild%255B0%255D%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520114_checkout_latency_tracing_rebuild%255B0%255D%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 44.3s / πŸ’° $0.16 | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522114_checkout_latency_tracing_rebuild%255B0%255D%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520114_checkout_latency_tracing_rebuild%255B0%255D%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 37.3s / πŸ’° $0.13 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522114_checkout_latency_tracing_rebuild%255B0%255D%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520114_checkout_latency_tracing_rebuild%255B0%255D%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 325.8s / πŸ’° $0.24 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522114_checkout_latency_tracing_rebuild%255B0%255D%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520114_checkout_latency_tracing_rebuild%255B0%255D%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 85.3s / πŸ’° $0.38 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522114_checkout_latency_tracing_rebuild%255B0%255D%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520114_checkout_latency_tracing_rebuild%255B0%255D%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 111.1s / πŸ’° $0.35 | +| [115_checkout_errors_tracing[0]](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/115_checkout_errors_tracing[0]/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522115_checkout_errors_tracing%255B0%255D%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520115_checkout_errors_tracing%255B0%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522115_checkout_errors_tracing%255B0%255D%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520115_checkout_errors_tracing%255B0%255D%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 52.0s / πŸ’° $0.24 | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522115_checkout_errors_tracing%255B0%255D%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520115_checkout_errors_tracing%255B0%255D%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 38.8s / πŸ’° $0.78 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522115_checkout_errors_tracing%255B0%255D%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520115_checkout_errors_tracing%255B0%255D%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 170.0s / πŸ’° $0.14 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522115_checkout_errors_tracing%255B0%255D%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520115_checkout_errors_tracing%255B0%255D%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 95.7s / πŸ’° $0.35 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522115_checkout_errors_tracing%255B0%255D%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520115_checkout_errors_tracing%255B0%255D%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 117.4s / πŸ’° $0.37 | +| [11_init_containers](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/11_init_containers/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252211_init_containers%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252011_init_containers%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252211_init_containers%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252011_init_containers%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 43.1s / πŸ’° $0.11 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252211_init_containers%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252011_init_containers%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 42.0s / πŸ’° $0.08 | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252211_init_containers%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252011_init_containers%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 23.6s / πŸ’° $0.02 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252211_init_containers%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252011_init_containers%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 41.3s / πŸ’° $0.10 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252211_init_containers%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252011_init_containers%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 54.9s / πŸ’° $0.13 | +| [121_new_relic_checkout_errors_tracing[0]](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/121_new_relic_checkout_errors_tracing[0]/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522121_new_relic_checkout_errors_tracing%255B0%255D%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520121_new_relic_checkout_errors_tracing%255B0%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522121_new_relic_checkout_errors_tracing%255B0%255D%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520121_new_relic_checkout_errors_tracing%255B0%255D%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 21.8s / πŸ’° $0.07 | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522121_new_relic_checkout_errors_tracing%255B0%255D%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520121_new_relic_checkout_errors_tracing%255B0%255D%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 15.7s / πŸ’° $0.03 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522121_new_relic_checkout_errors_tracing%255B0%255D%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520121_new_relic_checkout_errors_tracing%255B0%255D%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 446.7s / πŸ’° $0.31 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522121_new_relic_checkout_errors_tracing%255B0%255D%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520121_new_relic_checkout_errors_tracing%255B0%255D%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 96.3s / πŸ’° $0.36 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522121_new_relic_checkout_errors_tracing%255B0%255D%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520121_new_relic_checkout_errors_tracing%255B0%255D%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 151.5s / πŸ’° $0.44 | +| [122_new_relic_checkout_latency_tracing_rebuild[0]](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/122_new_relic_checkout_latency_tracing_rebuild[0]/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522122_new_relic_checkout_latency_tracing_rebuild%255B0%255D%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520122_new_relic_checkout_latency_tracing_rebuild%255B0%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522122_new_relic_checkout_latency_tracing_rebuild%255B0%255D%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520122_new_relic_checkout_latency_tracing_rebuild%255B0%255D%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 37.3s / πŸ’° $0.15 | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522122_new_relic_checkout_latency_tracing_rebuild%255B0%255D%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520122_new_relic_checkout_latency_tracing_rebuild%255B0%255D%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 81.6s / πŸ’° $0.19 | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522122_new_relic_checkout_latency_tracing_rebuild%255B0%255D%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520122_new_relic_checkout_latency_tracing_rebuild%255B0%255D%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 463.4s / πŸ’° $0.31 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522122_new_relic_checkout_latency_tracing_rebuild%255B0%255D%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520122_new_relic_checkout_latency_tracing_rebuild%255B0%255D%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 96.3s / πŸ’° $0.33 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522122_new_relic_checkout_latency_tracing_rebuild%255B0%255D%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520122_new_relic_checkout_latency_tracing_rebuild%255B0%255D%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 141.0s / πŸ’° $0.44 | +| [123_new_relic_checkout_errors_tracing[0]](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/123_new_relic_checkout_errors_tracing[0]/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522123_new_relic_checkout_errors_tracing%255B0%255D%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520123_new_relic_checkout_errors_tracing%255B0%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522123_new_relic_checkout_errors_tracing%255B0%255D%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520123_new_relic_checkout_errors_tracing%255B0%255D%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 26.1s / πŸ’° $0.07 | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522123_new_relic_checkout_errors_tracing%255B0%255D%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520123_new_relic_checkout_errors_tracing%255B0%255D%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 16.6s / πŸ’° $0.03 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522123_new_relic_checkout_errors_tracing%255B0%255D%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520123_new_relic_checkout_errors_tracing%255B0%255D%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 249.2s / πŸ’° $0.19 | [⏱️ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522123_new_relic_checkout_errors_tracing%255B0%255D%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520123_new_relic_checkout_errors_tracing%255B0%255D%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 617.6s | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522123_new_relic_checkout_errors_tracing%255B0%255D%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520123_new_relic_checkout_errors_tracing%255B0%255D%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 102.8s / πŸ’° $0.46 | +| [12_job_crashing](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/12_job_crashing/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252212_job_crashing%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252012_job_crashing%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252212_job_crashing%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252012_job_crashing%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 31.3s / πŸ’° $0.13 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252212_job_crashing%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252012_job_crashing%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 39.2s / πŸ’° $0.10 | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252212_job_crashing%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252012_job_crashing%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 137.1s / πŸ’° $0.09 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252212_job_crashing%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252012_job_crashing%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 59.3s / πŸ’° $0.13 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252212_job_crashing%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252012_job_crashing%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 59.2s / πŸ’° $0.14 | +| [13a_pending_node_selector_basic](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/13a_pending_node_selector_basic/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252213a_pending_node_selector_basic%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252013a_pending_node_selector_basic%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252213a_pending_node_selector_basic%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252013a_pending_node_selector_basic%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 36.0s / πŸ’° $0.14 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252213a_pending_node_selector_basic%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252013a_pending_node_selector_basic%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 39.0s / πŸ’° $0.08 | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252213a_pending_node_selector_basic%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252013a_pending_node_selector_basic%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 23.8s / πŸ’° $0.02 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252213a_pending_node_selector_basic%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252013a_pending_node_selector_basic%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 54.1s / πŸ’° $0.14 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252213a_pending_node_selector_basic%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252013a_pending_node_selector_basic%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 52.7s / πŸ’° $0.13 | +| [13b_pending_node_selector_detailed](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/13b_pending_node_selector_detailed/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252213b_pending_node_selector_detailed%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252013b_pending_node_selector_detailed%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252213b_pending_node_selector_detailed%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252013b_pending_node_selector_detailed%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 39.0s / πŸ’° $0.14 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252213b_pending_node_selector_detailed%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252013b_pending_node_selector_detailed%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 31.1s / πŸ’° $0.07 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252213b_pending_node_selector_detailed%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252013b_pending_node_selector_detailed%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 141.0s / πŸ’° $0.12 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252213b_pending_node_selector_detailed%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252013b_pending_node_selector_detailed%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 58.8s / πŸ’° $0.15 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252213b_pending_node_selector_detailed%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252013b_pending_node_selector_detailed%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 56.8s / πŸ’° $0.15 | +| [14_pending_resources](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/14_pending_resources/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252214_pending_resources%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252014_pending_resources%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252214_pending_resources%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252014_pending_resources%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 40.8s / πŸ’° $0.14 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252214_pending_resources%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252014_pending_resources%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 48.0s / πŸ’° $0.10 | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252214_pending_resources%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252014_pending_resources%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 24.5s / πŸ’° $0.02 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252214_pending_resources%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252014_pending_resources%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 69.1s / πŸ’° $0.14 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252214_pending_resources%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252014_pending_resources%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 58.8s / πŸ’° $0.13 | +| [156_kafka_opensearch_latency](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/156_kafka_opensearch_latency/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522156_kafka_opensearch_latency%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520156_kafka_opensearch_latency%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522156_kafka_opensearch_latency%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520156_kafka_opensearch_latency%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522156_kafka_opensearch_latency%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520156_kafka_opensearch_latency%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522156_kafka_opensearch_latency%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520156_kafka_opensearch_latency%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522156_kafka_opensearch_latency%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520156_kafka_opensearch_latency%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522156_kafka_opensearch_latency%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520156_kafka_opensearch_latency%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [159_prometheus_high_cardinality_cpu[0]](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/159_prometheus_high_cardinality_cpu[0]/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522159_prometheus_high_cardinality_cpu%255B0%255D%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520159_prometheus_high_cardinality_cpu%255B0%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522159_prometheus_high_cardinality_cpu%255B0%255D%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520159_prometheus_high_cardinality_cpu%255B0%255D%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 37.1s / πŸ’° $0.19 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522159_prometheus_high_cardinality_cpu%255B0%255D%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520159_prometheus_high_cardinality_cpu%255B0%255D%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 41.2s / πŸ’° $0.57 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522159_prometheus_high_cardinality_cpu%255B0%255D%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520159_prometheus_high_cardinality_cpu%255B0%255D%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 261.2s / πŸ’° $0.19 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522159_prometheus_high_cardinality_cpu%255B0%255D%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520159_prometheus_high_cardinality_cpu%255B0%255D%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 45.2s / πŸ’° $0.23 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522159_prometheus_high_cardinality_cpu%255B0%255D%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520159_prometheus_high_cardinality_cpu%255B0%255D%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 50.3s / πŸ’° $0.25 | +| [159_prometheus_high_cardinality_cpu[1]](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/159_prometheus_high_cardinality_cpu[1]/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522159_prometheus_high_cardinality_cpu%255B1%255D%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520159_prometheus_high_cardinality_cpu%255B1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522159_prometheus_high_cardinality_cpu%255B1%255D%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520159_prometheus_high_cardinality_cpu%255B1%255D%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 55.8s / πŸ’° $0.18 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522159_prometheus_high_cardinality_cpu%255B1%255D%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520159_prometheus_high_cardinality_cpu%255B1%255D%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 30.9s / πŸ’° $0.14 | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522159_prometheus_high_cardinality_cpu%255B1%255D%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520159_prometheus_high_cardinality_cpu%255B1%255D%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 196.1s / πŸ’° $0.13 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522159_prometheus_high_cardinality_cpu%255B1%255D%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520159_prometheus_high_cardinality_cpu%255B1%255D%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 48.5s / πŸ’° $0.21 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522159_prometheus_high_cardinality_cpu%255B1%255D%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520159_prometheus_high_cardinality_cpu%255B1%255D%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 39.2s / πŸ’° $0.21 | +| [159_prometheus_high_cardinality_cpu[2]](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/159_prometheus_high_cardinality_cpu[2]/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%2522159_prometheus_high_cardinality_cpu%255B2%255D%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%2520159_prometheus_high_cardinality_cpu%255B2%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522159_prometheus_high_cardinality_cpu%255B2%255D%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520159_prometheus_high_cardinality_cpu%255B2%255D%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 24.5s / πŸ’° $0.12 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522159_prometheus_high_cardinality_cpu%255B2%255D%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520159_prometheus_high_cardinality_cpu%255B2%255D%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 25.8s / πŸ’° $0.10 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522159_prometheus_high_cardinality_cpu%255B2%255D%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520159_prometheus_high_cardinality_cpu%255B2%255D%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 151.9s / πŸ’° $0.13 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522159_prometheus_high_cardinality_cpu%255B2%255D%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520159_prometheus_high_cardinality_cpu%255B2%255D%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 41.3s / πŸ’° $0.24 | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%2522159_prometheus_high_cardinality_cpu%255B2%255D%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%2520159_prometheus_high_cardinality_cpu%255B2%255D%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 32.3s / πŸ’° $0.12 | +| [15_failed_readiness_probe](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/15_failed_readiness_probe/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252215_failed_readiness_probe%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252015_failed_readiness_probe%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252215_failed_readiness_probe%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252015_failed_readiness_probe%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 38.3s / πŸ’° $0.13 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252215_failed_readiness_probe%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252015_failed_readiness_probe%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 43.4s / πŸ’° $0.21 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252215_failed_readiness_probe%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252015_failed_readiness_probe%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 175.8s / πŸ’° $0.14 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252215_failed_readiness_probe%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252015_failed_readiness_probe%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 59.0s / πŸ’° $0.14 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252215_failed_readiness_probe%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252015_failed_readiness_probe%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 60.3s / πŸ’° $0.15 | +| [16_failed_no_toolset_found](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/16_failed_no_toolset_found/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252216_failed_no_toolset_found%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252016_failed_no_toolset_found%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252216_failed_no_toolset_found%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252016_failed_no_toolset_found%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 28.9s / πŸ’° $0.06 | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252216_failed_no_toolset_found%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252016_failed_no_toolset_found%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 19.3s / πŸ’° $0.03 | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252216_failed_no_toolset_found%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252016_failed_no_toolset_found%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 44.8s / πŸ’° $0.02 | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252216_failed_no_toolset_found%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252016_failed_no_toolset_found%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 21.5s / πŸ’° $0.06 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252216_failed_no_toolset_found%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252016_failed_no_toolset_found%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 20.5s / πŸ’° $0.06 | +| [17_oom_kill](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/17_oom_kill/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252217_oom_kill%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252017_oom_kill%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252217_oom_kill%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252017_oom_kill%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 34.4s / πŸ’° $0.12 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252217_oom_kill%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252017_oom_kill%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 40.6s / πŸ’° $0.11 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252217_oom_kill%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252017_oom_kill%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 180.5s / πŸ’° $0.11 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252217_oom_kill%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252017_oom_kill%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 51.3s / πŸ’° $0.16 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252217_oom_kill%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252017_oom_kill%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 239.6s / πŸ’° $0.18 | +| [19_detect_missing_app_details](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/19_detect_missing_app_details/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252219_detect_missing_app_details%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252019_detect_missing_app_details%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252219_detect_missing_app_details%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252019_detect_missing_app_details%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 44.6s / πŸ’° $0.14 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252219_detect_missing_app_details%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252019_detect_missing_app_details%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 46.9s / πŸ’° $0.12 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252219_detect_missing_app_details%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252019_detect_missing_app_details%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 319.4s / πŸ’° $0.21 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252219_detect_missing_app_details%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252019_detect_missing_app_details%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 89.0s / πŸ’° $0.20 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252219_detect_missing_app_details%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252019_detect_missing_app_details%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 80.3s / πŸ’° $0.13 | +| [20_long_log_file_search](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/20_long_log_file_search/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252220_long_log_file_search%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252020_long_log_file_search%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252220_long_log_file_search%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252020_long_log_file_search%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 36.9s / πŸ’° $0.13 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252220_long_log_file_search%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252020_long_log_file_search%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 30.0s / πŸ’° $0.05 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252220_long_log_file_search%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252020_long_log_file_search%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 80.9s / πŸ’° $0.05 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252220_long_log_file_search%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252020_long_log_file_search%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 42.3s / πŸ’° $0.10 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252220_long_log_file_search%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252020_long_log_file_search%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 72.9s / πŸ’° $0.11 | +| [21_job_fail_curl_no_svc_account](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/21_job_fail_curl_no_svc_account/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252221_job_fail_curl_no_svc_account%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252021_job_fail_curl_no_svc_account%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252221_job_fail_curl_no_svc_account%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252021_job_fail_curl_no_svc_account%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 45.0s / πŸ’° $0.29 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252221_job_fail_curl_no_svc_account%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252021_job_fail_curl_no_svc_account%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 41.3s / πŸ’° $0.20 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252221_job_fail_curl_no_svc_account%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252021_job_fail_curl_no_svc_account%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 291.9s / πŸ’° $0.20 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252221_job_fail_curl_no_svc_account%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252021_job_fail_curl_no_svc_account%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 48.4s / πŸ’° $0.17 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252221_job_fail_curl_no_svc_account%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252021_job_fail_curl_no_svc_account%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 49.4s / πŸ’° $0.12 | +| [22_high_latency_dbi_down](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/22_high_latency_dbi_down/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252222_high_latency_dbi_down%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252022_high_latency_dbi_down%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252222_high_latency_dbi_down%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252022_high_latency_dbi_down%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252222_high_latency_dbi_down%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252022_high_latency_dbi_down%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252222_high_latency_dbi_down%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252022_high_latency_dbi_down%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252222_high_latency_dbi_down%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252022_high_latency_dbi_down%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252222_high_latency_dbi_down%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252022_high_latency_dbi_down%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [23_app_error_in_current_logs](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/23_app_error_in_current_logs/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252223_app_error_in_current_logs%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252023_app_error_in_current_logs%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252223_app_error_in_current_logs%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252023_app_error_in_current_logs%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 58.5s / πŸ’° $0.41 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252223_app_error_in_current_logs%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252023_app_error_in_current_logs%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 60.2s / πŸ’° $0.14 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252223_app_error_in_current_logs%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252023_app_error_in_current_logs%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 310.8s / πŸ’° $0.40 | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252223_app_error_in_current_logs%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252023_app_error_in_current_logs%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 40.3s | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252223_app_error_in_current_logs%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252023_app_error_in_current_logs%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 70.2s / πŸ’° $0.23 | +| [24_misconfigured_pvc](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/24_misconfigured_pvc/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252224_misconfigured_pvc%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252024_misconfigured_pvc%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252224_misconfigured_pvc%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252024_misconfigured_pvc%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 54.7s / πŸ’° $0.16 | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252224_misconfigured_pvc%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252024_misconfigured_pvc%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 52.0s / πŸ’° $0.10 | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252224_misconfigured_pvc%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252024_misconfigured_pvc%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 26.0s / πŸ’° $0.02 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252224_misconfigured_pvc%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252024_misconfigured_pvc%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 56.4s / πŸ’° $0.14 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252224_misconfigured_pvc%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252024_misconfigured_pvc%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 60.3s / πŸ’° $0.15 | +| [24a_misconfigured_pvc_basic](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/24a_misconfigured_pvc_basic/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252224a_misconfigured_pvc_basic%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252024a_misconfigured_pvc_basic%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252224a_misconfigured_pvc_basic%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252024a_misconfigured_pvc_basic%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 35.1s / πŸ’° $0.16 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252224a_misconfigured_pvc_basic%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252024a_misconfigured_pvc_basic%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 60.1s / πŸ’° $0.11 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252224a_misconfigured_pvc_basic%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252024a_misconfigured_pvc_basic%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 276.9s / πŸ’° $0.26 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252224a_misconfigured_pvc_basic%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252024a_misconfigured_pvc_basic%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 63.9s / πŸ’° $0.15 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252224a_misconfigured_pvc_basic%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252024a_misconfigured_pvc_basic%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 64.2s / πŸ’° $0.16 | +| [24b_misconfigured_pvc_detailed](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/24b_misconfigured_pvc_detailed/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252224b_misconfigured_pvc_detailed%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252024b_misconfigured_pvc_detailed%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252224b_misconfigured_pvc_detailed%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252024b_misconfigured_pvc_detailed%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 30.2s / πŸ’° $0.10 | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252224b_misconfigured_pvc_detailed%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252024b_misconfigured_pvc_detailed%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 37.7s / πŸ’° $0.10 | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252224b_misconfigured_pvc_detailed%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252024b_misconfigured_pvc_detailed%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 24.1s / πŸ’° $0.02 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252224b_misconfigured_pvc_detailed%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252024b_misconfigured_pvc_detailed%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 61.4s / πŸ’° $0.15 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252224b_misconfigured_pvc_detailed%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252024b_misconfigured_pvc_detailed%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 62.7s / πŸ’° $0.15 | +| [25_misconfigured_ingress_class](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/25_misconfigured_ingress_class/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252225_misconfigured_ingress_class%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252025_misconfigured_ingress_class%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252225_misconfigured_ingress_class%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252025_misconfigured_ingress_class%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 85.5s / πŸ’° $0.43 | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252225_misconfigured_ingress_class%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252025_misconfigured_ingress_class%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 47.7s / πŸ’° $0.09 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252225_misconfigured_ingress_class%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252025_misconfigured_ingress_class%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 466.9s / πŸ’° $0.26 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252225_misconfigured_ingress_class%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252025_misconfigured_ingress_class%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 110.3s / πŸ’° $0.37 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252225_misconfigured_ingress_class%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252025_misconfigured_ingress_class%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 76.8s / πŸ’° $0.26 | +| [26_page_render_times](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/26_page_render_times/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252226_page_render_times%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252026_page_render_times%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252226_page_render_times%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252026_page_render_times%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 29.5s / πŸ’° $0.14 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252226_page_render_times%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252026_page_render_times%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 48.7s / πŸ’° $0.10 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252226_page_render_times%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252026_page_render_times%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 479.6s / πŸ’° $0.31 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252226_page_render_times%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252026_page_render_times%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 55.3s / πŸ’° $0.15 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252226_page_render_times%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252026_page_render_times%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 51.4s / πŸ’° $0.17 | +| [27a_multi_container_logs](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/27a_multi_container_logs/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252227a_multi_container_logs%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252027a_multi_container_logs%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252227a_multi_container_logs%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252027a_multi_container_logs%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 28.8s / πŸ’° $0.13 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252227a_multi_container_logs%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252027a_multi_container_logs%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 32.0s / πŸ’° $0.08 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252227a_multi_container_logs%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252027a_multi_container_logs%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 95.2s / πŸ’° $0.08 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252227a_multi_container_logs%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252027a_multi_container_logs%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 40.5s / πŸ’° $0.13 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252227a_multi_container_logs%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252027a_multi_container_logs%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 38.8s / πŸ’° $0.12 | +| [27b_multi_container_logs](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/27b_multi_container_logs/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252227b_multi_container_logs%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252027b_multi_container_logs%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252227b_multi_container_logs%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252027b_multi_container_logs%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 32.0s / πŸ’° $0.12 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252227b_multi_container_logs%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252027b_multi_container_logs%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 37.1s / πŸ’° $0.09 | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252227b_multi_container_logs%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252027b_multi_container_logs%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 57.0s / πŸ’° $0.03 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252227b_multi_container_logs%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252027b_multi_container_logs%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 33.9s / πŸ’° $0.11 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252227b_multi_container_logs%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252027b_multi_container_logs%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 34.7s / πŸ’° $0.11 | +| [28_permissions_error](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/28_permissions_error/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252228_permissions_error%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252028_permissions_error%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252228_permissions_error%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252028_permissions_error%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 19.1s / πŸ’° $0.08 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252228_permissions_error%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252028_permissions_error%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 21.6s / πŸ’° $0.05 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252228_permissions_error%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252028_permissions_error%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 150.7s / πŸ’° $0.10 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252228_permissions_error%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252028_permissions_error%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 18.6s / πŸ’° $0.06 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252228_permissions_error%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252028_permissions_error%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 21.6s / πŸ’° $0.06 | +| [33_cpu_metrics_discovery](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/33_cpu_metrics_discovery/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252233_cpu_metrics_discovery%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252033_cpu_metrics_discovery%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252233_cpu_metrics_discovery%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252033_cpu_metrics_discovery%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 42.4s / πŸ’° $0.13 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252233_cpu_metrics_discovery%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252033_cpu_metrics_discovery%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 39.8s / πŸ’° $0.10 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252233_cpu_metrics_discovery%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252033_cpu_metrics_discovery%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 298.6s / πŸ’° $0.20 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252233_cpu_metrics_discovery%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252033_cpu_metrics_discovery%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 46.8s / πŸ’° $0.13 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252233_cpu_metrics_discovery%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252033_cpu_metrics_discovery%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 106.7s / πŸ’° $0.12 | +| [39_failed_toolset](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/39_failed_toolset/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252239_failed_toolset%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252039_failed_toolset%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252239_failed_toolset%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252039_failed_toolset%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 21.2s / πŸ’° $0.03 | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252239_failed_toolset%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252039_failed_toolset%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 32.8s / πŸ’° $0.05 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252239_failed_toolset%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252039_failed_toolset%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 281.6s / πŸ’° $0.20 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252239_failed_toolset%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252039_failed_toolset%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 52.6s / πŸ’° $0.11 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252239_failed_toolset%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252039_failed_toolset%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 58.1s / πŸ’° $0.12 | +| [41_setup_argo](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/41_setup_argo/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252241_setup_argo%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252041_setup_argo%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252241_setup_argo%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252041_setup_argo%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 22.1s / πŸ’° $0.03 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252241_setup_argo%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252041_setup_argo%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 23.2s / πŸ’° $0.05 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252241_setup_argo%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252041_setup_argo%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 174.7s / πŸ’° $0.11 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252241_setup_argo%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252041_setup_argo%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 18.0s / πŸ’° $0.05 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252241_setup_argo%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252041_setup_argo%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 16.9s / πŸ’° $0.05 | +| [42_dns_issues_result_new_tools_no_runbook](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/42_dns_issues_result_new_tools_no_runbook/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252242_dns_issues_result_new_tools_no_runbook%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252042_dns_issues_result_new_tools_no_runbook%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252242_dns_issues_result_new_tools_no_runbook%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252042_dns_issues_result_new_tools_no_runbook%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 35.3s / πŸ’° $0.13 | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252242_dns_issues_result_new_tools_no_runbook%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252042_dns_issues_result_new_tools_no_runbook%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 65.3s / πŸ’° $0.14 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252242_dns_issues_result_new_tools_no_runbook%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252042_dns_issues_result_new_tools_no_runbook%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 348.3s / πŸ’° $0.22 | [⏱️ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252242_dns_issues_result_new_tools_no_runbook%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252042_dns_issues_result_new_tools_no_runbook%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 673.9s | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252242_dns_issues_result_new_tools_no_runbook%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252042_dns_issues_result_new_tools_no_runbook%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 97.5s / πŸ’° $0.25 | +| [42_dns_issues_steps_new_tools](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/42_dns_issues_steps_new_tools/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252242_dns_issues_steps_new_tools%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252042_dns_issues_steps_new_tools%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252242_dns_issues_steps_new_tools%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252042_dns_issues_steps_new_tools%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 49.0s / πŸ’° $0.15 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252242_dns_issues_steps_new_tools%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252042_dns_issues_steps_new_tools%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 74.8s / πŸ’° $0.16 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252242_dns_issues_steps_new_tools%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252042_dns_issues_steps_new_tools%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 346.7s / πŸ’° $0.18 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252242_dns_issues_steps_new_tools%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252042_dns_issues_steps_new_tools%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 93.8s / πŸ’° $0.27 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252242_dns_issues_steps_new_tools%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252042_dns_issues_steps_new_tools%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 92.2s / πŸ’° $0.28 | +| [43_current_datetime_from_prompt](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/43_current_datetime_from_prompt/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252243_current_datetime_from_prompt%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252043_current_datetime_from_prompt%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252243_current_datetime_from_prompt%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252043_current_datetime_from_prompt%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 12.8s / πŸ’° $0.03 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252243_current_datetime_from_prompt%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252043_current_datetime_from_prompt%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 37.0s / πŸ’° $0.04 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252243_current_datetime_from_prompt%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252043_current_datetime_from_prompt%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 135.6s / πŸ’° $0.07 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252243_current_datetime_from_prompt%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252043_current_datetime_from_prompt%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 14.5s / πŸ’° $0.05 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252243_current_datetime_from_prompt%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252043_current_datetime_from_prompt%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 14.1s / πŸ’° $0.05 | +| [43_slack_deployment_logs](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/43_slack_deployment_logs/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252243_slack_deployment_logs%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252043_slack_deployment_logs%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252243_slack_deployment_logs%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252043_slack_deployment_logs%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252243_slack_deployment_logs%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252043_slack_deployment_logs%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252243_slack_deployment_logs%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252043_slack_deployment_logs%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252243_slack_deployment_logs%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252043_slack_deployment_logs%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252243_slack_deployment_logs%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252043_slack_deployment_logs%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [44_slack_statefulset_logs](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/44_slack_statefulset_logs/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252244_slack_statefulset_logs%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252044_slack_statefulset_logs%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252244_slack_statefulset_logs%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252044_slack_statefulset_logs%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252244_slack_statefulset_logs%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252044_slack_statefulset_logs%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252244_slack_statefulset_logs%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252044_slack_statefulset_logs%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252244_slack_statefulset_logs%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252044_slack_statefulset_logs%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252244_slack_statefulset_logs%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252044_slack_statefulset_logs%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [45_fetch_deployment_logs_simple](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/45_fetch_deployment_logs_simple/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252245_fetch_deployment_logs_simple%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252045_fetch_deployment_logs_simple%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252245_fetch_deployment_logs_simple%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252045_fetch_deployment_logs_simple%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 29.0s / πŸ’° $0.10 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252245_fetch_deployment_logs_simple%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252045_fetch_deployment_logs_simple%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 37.7s / πŸ’° $0.07 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252245_fetch_deployment_logs_simple%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252045_fetch_deployment_logs_simple%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 86.2s / πŸ’° $0.08 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252245_fetch_deployment_logs_simple%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252045_fetch_deployment_logs_simple%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 36.2s / πŸ’° $0.09 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252245_fetch_deployment_logs_simple%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252045_fetch_deployment_logs_simple%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 41.6s / πŸ’° $0.10 | +| [48_logs_since_thursday](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/48_logs_since_thursday/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252248_logs_since_thursday%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252048_logs_since_thursday%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252248_logs_since_thursday%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252048_logs_since_thursday%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252248_logs_since_thursday%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252048_logs_since_thursday%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252248_logs_since_thursday%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252048_logs_since_thursday%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252248_logs_since_thursday%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252048_logs_since_thursday%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252248_logs_since_thursday%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252048_logs_since_thursday%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [50_logs_since_specific_date](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/50_logs_since_specific_date/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252250_logs_since_specific_date%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252050_logs_since_specific_date%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252250_logs_since_specific_date%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252050_logs_since_specific_date%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 19.3s / πŸ’° $0.10 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252250_logs_since_specific_date%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252050_logs_since_specific_date%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 21.6s / πŸ’° $0.07 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252250_logs_since_specific_date%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252050_logs_since_specific_date%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 79.0s / πŸ’° $0.07 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252250_logs_since_specific_date%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252050_logs_since_specific_date%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 28.8s / πŸ’° $0.13 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252250_logs_since_specific_date%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252050_logs_since_specific_date%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 26.8s / πŸ’° $0.11 | +| [50a_logs_since_last_specific_month](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/50a_logs_since_last_specific_month/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252250a_logs_since_last_specific_month%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252050a_logs_since_last_specific_month%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252250a_logs_since_last_specific_month%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252050a_logs_since_last_specific_month%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 36.1s / πŸ’° $0.12 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252250a_logs_since_last_specific_month%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252050a_logs_since_last_specific_month%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 28.7s / πŸ’° $0.08 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252250a_logs_since_last_specific_month%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252050a_logs_since_last_specific_month%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 117.3s / πŸ’° $0.08 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252250a_logs_since_last_specific_month%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252050a_logs_since_last_specific_month%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 44.0s / πŸ’° $0.11 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252250a_logs_since_last_specific_month%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252050a_logs_since_last_specific_month%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 35.9s / πŸ’° $0.09 | +| [51_logs_summarize_errors](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/51_logs_summarize_errors/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252251_logs_summarize_errors%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252051_logs_summarize_errors%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252251_logs_summarize_errors%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252051_logs_summarize_errors%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 28.7s / πŸ’° $0.12 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252251_logs_summarize_errors%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252051_logs_summarize_errors%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 25.5s / πŸ’° $0.04 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252251_logs_summarize_errors%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252051_logs_summarize_errors%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 76.1s / πŸ’° $0.08 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252251_logs_summarize_errors%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252051_logs_summarize_errors%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 38.5s / πŸ’° $0.09 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252251_logs_summarize_errors%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252051_logs_summarize_errors%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 39.1s / πŸ’° $0.10 | +| [52_logs_login_issues](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/52_logs_login_issues/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252252_logs_login_issues%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252052_logs_login_issues%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252252_logs_login_issues%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252052_logs_login_issues%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 36.0s / πŸ’° $0.10 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252252_logs_login_issues%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252052_logs_login_issues%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 64.5s / πŸ’° $0.63 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252252_logs_login_issues%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252052_logs_login_issues%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 251.3s / πŸ’° $0.20 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252252_logs_login_issues%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252052_logs_login_issues%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 48.7s / πŸ’° $0.12 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252252_logs_login_issues%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252052_logs_login_issues%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 50.1s / πŸ’° $0.23 | +| [53_logs_find_term](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/53_logs_find_term/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252253_logs_find_term%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252053_logs_find_term%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252253_logs_find_term%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252053_logs_find_term%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 96.2s / πŸ’° $0.19 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252253_logs_find_term%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252053_logs_find_term%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 29.1s / πŸ’° $0.09 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252253_logs_find_term%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252053_logs_find_term%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 70.7s / πŸ’° $0.06 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252253_logs_find_term%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252053_logs_find_term%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 38.9s / πŸ’° $0.13 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252253_logs_find_term%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252053_logs_find_term%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 38.2s / πŸ’° $0.13 | +| [54_not_truncated_when_getting_pods](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/54_not_truncated_when_getting_pods/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252254_not_truncated_when_getting_pods%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252054_not_truncated_when_getting_pods%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252254_not_truncated_when_getting_pods%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252054_not_truncated_when_getting_pods%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 34.7s / πŸ’° $0.16 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252254_not_truncated_when_getting_pods%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252054_not_truncated_when_getting_pods%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 43.4s / πŸ’° $0.10 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252254_not_truncated_when_getting_pods%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252054_not_truncated_when_getting_pods%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 314.9s / πŸ’° $0.25 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252254_not_truncated_when_getting_pods%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252054_not_truncated_when_getting_pods%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 42.1s / πŸ’° $0.10 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252254_not_truncated_when_getting_pods%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252054_not_truncated_when_getting_pods%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 42.0s / πŸ’° $0.11 | +| [55_kafka_runbook](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/55_kafka_runbook/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252255_kafka_runbook%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252055_kafka_runbook%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252255_kafka_runbook%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252055_kafka_runbook%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252255_kafka_runbook%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252055_kafka_runbook%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252255_kafka_runbook%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252055_kafka_runbook%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252255_kafka_runbook%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252055_kafka_runbook%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252255_kafka_runbook%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252055_kafka_runbook%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [57_wrong_namespace](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/57_wrong_namespace/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252257_wrong_namespace%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252057_wrong_namespace%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252257_wrong_namespace%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252057_wrong_namespace%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 28.0s / πŸ’° $0.10 | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252257_wrong_namespace%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252057_wrong_namespace%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 30.1s / πŸ’° $0.05 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252257_wrong_namespace%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252057_wrong_namespace%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 168.4s / πŸ’° $0.12 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252257_wrong_namespace%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252057_wrong_namespace%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 40.5s / πŸ’° $0.09 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252257_wrong_namespace%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252057_wrong_namespace%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 41.3s / πŸ’° $0.09 | +| [59_label_based_counting](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/59_label_based_counting/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252259_label_based_counting%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252059_label_based_counting%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252259_label_based_counting%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252059_label_based_counting%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 24.7s / πŸ’° $0.10 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252259_label_based_counting%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252059_label_based_counting%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 23.9s / πŸ’° $0.05 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252259_label_based_counting%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252059_label_based_counting%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 46.1s / πŸ’° $0.03 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252259_label_based_counting%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252059_label_based_counting%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 26.2s / πŸ’° $0.07 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252259_label_based_counting%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252059_label_based_counting%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 23.7s / πŸ’° $0.07 | +| [60_count_less_than](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/60_count_less_than/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252260_count_less_than%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252060_count_less_than%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252260_count_less_than%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252060_count_less_than%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 28.6s / πŸ’° $0.12 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252260_count_less_than%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252060_count_less_than%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 34.5s / πŸ’° $0.06 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252260_count_less_than%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252060_count_less_than%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 75.9s / πŸ’° $0.04 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252260_count_less_than%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252060_count_less_than%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 23.1s / πŸ’° $0.07 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252260_count_less_than%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252060_count_less_than%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 33.7s / πŸ’° $0.09 | +| [61_exact_match_counting](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/61_exact_match_counting/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252261_exact_match_counting%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252061_exact_match_counting%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252261_exact_match_counting%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252061_exact_match_counting%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 42.7s / πŸ’° $0.08 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252261_exact_match_counting%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252061_exact_match_counting%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 28.1s / πŸ’° $0.05 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252261_exact_match_counting%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252061_exact_match_counting%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 47.9s / πŸ’° $0.03 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252261_exact_match_counting%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252061_exact_match_counting%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 24.0s / πŸ’° $0.07 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252261_exact_match_counting%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252061_exact_match_counting%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 26.2s / πŸ’° $0.07 | +| [62_fetch_error_logs_with_errors](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/62_fetch_error_logs_with_errors/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252262_fetch_error_logs_with_errors%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252062_fetch_error_logs_with_errors%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252262_fetch_error_logs_with_errors%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252062_fetch_error_logs_with_errors%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 27.1s / πŸ’° $0.10 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252262_fetch_error_logs_with_errors%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252062_fetch_error_logs_with_errors%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 39.6s / πŸ’° $0.08 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252262_fetch_error_logs_with_errors%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252062_fetch_error_logs_with_errors%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 99.9s / πŸ’° $0.06 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252262_fetch_error_logs_with_errors%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252062_fetch_error_logs_with_errors%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 29.9s / πŸ’° $0.08 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252262_fetch_error_logs_with_errors%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252062_fetch_error_logs_with_errors%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 114.0s / πŸ’° $0.08 | +| [63_fetch_error_logs_no_errors](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/63_fetch_error_logs_no_errors/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252263_fetch_error_logs_no_errors%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252063_fetch_error_logs_no_errors%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252263_fetch_error_logs_no_errors%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252063_fetch_error_logs_no_errors%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 43.0s / πŸ’° $0.11 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252263_fetch_error_logs_no_errors%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252063_fetch_error_logs_no_errors%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 34.0s / πŸ’° $0.06 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252263_fetch_error_logs_no_errors%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252063_fetch_error_logs_no_errors%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 153.9s / πŸ’° $0.12 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252263_fetch_error_logs_no_errors%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252063_fetch_error_logs_no_errors%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 26.0s / πŸ’° $0.07 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252263_fetch_error_logs_no_errors%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252063_fetch_error_logs_no_errors%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 34.7s / πŸ’° $0.08 | +| [64_keda_vs_hpa_confusion](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/64_keda_vs_hpa_confusion/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252264_keda_vs_hpa_confusion%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252064_keda_vs_hpa_confusion%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252264_keda_vs_hpa_confusion%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252064_keda_vs_hpa_confusion%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 73.4s / πŸ’° $0.57 | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252264_keda_vs_hpa_confusion%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252064_keda_vs_hpa_confusion%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 103.5s / πŸ’° $0.82 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252264_keda_vs_hpa_confusion%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252064_keda_vs_hpa_confusion%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 143.0s / πŸ’° $0.10 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252264_keda_vs_hpa_confusion%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252064_keda_vs_hpa_confusion%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 57.3s / πŸ’° $0.20 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252264_keda_vs_hpa_confusion%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252064_keda_vs_hpa_confusion%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 75.3s / πŸ’° $0.25 | +| [65_health_check_followup](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/65_health_check_followup/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252265_health_check_followup%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252065_health_check_followup%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252265_health_check_followup%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252065_health_check_followup%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 72.7s / πŸ’° $0.25 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252265_health_check_followup%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252065_health_check_followup%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 58.8s / πŸ’° $0.12 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252265_health_check_followup%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252065_health_check_followup%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 146.5s / πŸ’° $0.14 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252265_health_check_followup%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252065_health_check_followup%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 74.4s / πŸ’° $0.26 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252265_health_check_followup%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252065_health_check_followup%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 80.1s / πŸ’° $0.30 | +| [71_connection_pool_starvation](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/71_connection_pool_starvation/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252271_connection_pool_starvation%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252071_connection_pool_starvation%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252271_connection_pool_starvation%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252071_connection_pool_starvation%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 35.8s / πŸ’° $0.18 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252271_connection_pool_starvation%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252071_connection_pool_starvation%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 39.1s / πŸ’° $0.55 | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252271_connection_pool_starvation%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252071_connection_pool_starvation%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 117.4s / πŸ’° $0.11 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252271_connection_pool_starvation%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252071_connection_pool_starvation%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 61.0s / πŸ’° $0.82 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252271_connection_pool_starvation%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252071_connection_pool_starvation%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 60.3s / πŸ’° $0.33 | +| [73a_time_window_anomaly](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/73a_time_window_anomaly/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252273a_time_window_anomaly%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252073a_time_window_anomaly%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252273a_time_window_anomaly%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252073a_time_window_anomaly%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 41.7s / πŸ’° $0.22 | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252273a_time_window_anomaly%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252073a_time_window_anomaly%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 38.5s / πŸ’° $0.56 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252273a_time_window_anomaly%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252073a_time_window_anomaly%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 122.5s / πŸ’° $0.13 | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252273a_time_window_anomaly%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252073a_time_window_anomaly%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 48.8s / πŸ’° $0.72 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252273a_time_window_anomaly%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252073a_time_window_anomaly%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 66.3s / πŸ’° $0.15 | +| [73b_time_window_anomaly](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/73b_time_window_anomaly/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252273b_time_window_anomaly%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252073b_time_window_anomaly%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252273b_time_window_anomaly%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252073b_time_window_anomaly%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 36.3s / πŸ’° $0.19 | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252273b_time_window_anomaly%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252073b_time_window_anomaly%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 28.2s / πŸ’° $0.05 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252273b_time_window_anomaly%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252073b_time_window_anomaly%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 108.9s / πŸ’° $0.08 | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252273b_time_window_anomaly%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252073b_time_window_anomaly%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 55.6s / πŸ’° $0.78 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252273b_time_window_anomaly%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252073b_time_window_anomaly%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 57.1s / πŸ’° $0.27 | +| [76_service_discovery_issue](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/76_service_discovery_issue/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252276_service_discovery_issue%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252076_service_discovery_issue%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252276_service_discovery_issue%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252076_service_discovery_issue%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 48.0s / πŸ’° $0.25 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252276_service_discovery_issue%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252076_service_discovery_issue%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 51.8s / πŸ’° $0.13 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252276_service_discovery_issue%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252076_service_discovery_issue%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 168.2s / πŸ’° $0.12 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252276_service_discovery_issue%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252076_service_discovery_issue%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 67.0s / πŸ’° $0.95 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252276_service_discovery_issue%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252076_service_discovery_issue%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 54.1s / πŸ’° $0.13 | +| [77_liveness_probe_misconfiguration](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/77_liveness_probe_misconfiguration/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252277_liveness_probe_misconfiguration%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252077_liveness_probe_misconfiguration%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252277_liveness_probe_misconfiguration%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252077_liveness_probe_misconfiguration%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 640.0s / πŸ’° $0.18 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252277_liveness_probe_misconfiguration%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252077_liveness_probe_misconfiguration%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 41.2s / πŸ’° $0.08 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252277_liveness_probe_misconfiguration%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252077_liveness_probe_misconfiguration%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 558.4s / πŸ’° $0.17 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252277_liveness_probe_misconfiguration%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252077_liveness_probe_misconfiguration%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 48.0s / πŸ’° $0.13 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252277_liveness_probe_misconfiguration%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252077_liveness_probe_misconfiguration%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 55.2s / πŸ’° $0.14 | +| [78a_missing_cpu_limits](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/78a_missing_cpu_limits/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252278a_missing_cpu_limits%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252078a_missing_cpu_limits%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252278a_missing_cpu_limits%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252078a_missing_cpu_limits%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 33.4s / πŸ’° $0.10 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252278a_missing_cpu_limits%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252078a_missing_cpu_limits%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 68.8s / πŸ’° $0.12 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252278a_missing_cpu_limits%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252078a_missing_cpu_limits%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 119.6s / πŸ’° $0.10 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252278a_missing_cpu_limits%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252078a_missing_cpu_limits%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 45.7s / πŸ’° $0.11 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252278a_missing_cpu_limits%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252078a_missing_cpu_limits%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 55.9s / πŸ’° $0.12 | +| [78b_cpu_quota_exceeded](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/78b_cpu_quota_exceeded/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252278b_cpu_quota_exceeded%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252078b_cpu_quota_exceeded%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252278b_cpu_quota_exceeded%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252078b_cpu_quota_exceeded%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 42.6s / πŸ’° $0.18 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252278b_cpu_quota_exceeded%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252078b_cpu_quota_exceeded%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 46.4s / πŸ’° $0.10 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252278b_cpu_quota_exceeded%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252078b_cpu_quota_exceeded%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 136.9s / πŸ’° $0.11 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252278b_cpu_quota_exceeded%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252078b_cpu_quota_exceeded%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 47.7s / πŸ’° $0.12 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252278b_cpu_quota_exceeded%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252078b_cpu_quota_exceeded%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 66.6s / πŸ’° $0.16 | +| [79_configmap_mount_issue](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/79_configmap_mount_issue/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252279_configmap_mount_issue%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252079_configmap_mount_issue%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252279_configmap_mount_issue%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252079_configmap_mount_issue%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 32.2s / πŸ’° $0.12 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252279_configmap_mount_issue%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252079_configmap_mount_issue%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 29.6s / πŸ’° $0.08 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252279_configmap_mount_issue%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252079_configmap_mount_issue%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 79.0s / πŸ’° $0.08 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252279_configmap_mount_issue%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252079_configmap_mount_issue%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 55.7s / πŸ’° $0.13 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252279_configmap_mount_issue%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252079_configmap_mount_issue%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 54.7s / πŸ’° $0.13 | +| [80_pvc_storage_class_mismatch](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/80_pvc_storage_class_mismatch/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252280_pvc_storage_class_mismatch%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252080_pvc_storage_class_mismatch%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252280_pvc_storage_class_mismatch%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252080_pvc_storage_class_mismatch%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 39.5s / πŸ’° $0.10 | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252280_pvc_storage_class_mismatch%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252080_pvc_storage_class_mismatch%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 44.9s / πŸ’° $0.08 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252280_pvc_storage_class_mismatch%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252080_pvc_storage_class_mismatch%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 128.6s / πŸ’° $0.09 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252280_pvc_storage_class_mismatch%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252080_pvc_storage_class_mismatch%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 44.2s / πŸ’° $0.11 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252280_pvc_storage_class_mismatch%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252080_pvc_storage_class_mismatch%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 59.5s / πŸ’° $0.15 | +| [81_service_account_permission_denied](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/81_service_account_permission_denied/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252281_service_account_permission_denied%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252081_service_account_permission_denied%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252281_service_account_permission_denied%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252081_service_account_permission_denied%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 43.3s / πŸ’° $0.27 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252281_service_account_permission_denied%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252081_service_account_permission_denied%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 65.7s / πŸ’° $0.77 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252281_service_account_permission_denied%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252081_service_account_permission_denied%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 147.4s / πŸ’° $0.13 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252281_service_account_permission_denied%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252081_service_account_permission_denied%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 54.5s / πŸ’° $0.18 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252281_service_account_permission_denied%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252081_service_account_permission_denied%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 67.6s / πŸ’° $0.29 | +| [82_pod_anti_affinity_conflict](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/82_pod_anti_affinity_conflict/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252282_pod_anti_affinity_conflict%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252082_pod_anti_affinity_conflict%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252282_pod_anti_affinity_conflict%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252082_pod_anti_affinity_conflict%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 34.3s / πŸ’° $0.12 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252282_pod_anti_affinity_conflict%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252082_pod_anti_affinity_conflict%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 44.5s / πŸ’° $0.08 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252282_pod_anti_affinity_conflict%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252082_pod_anti_affinity_conflict%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 331.2s / πŸ’° $0.20 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252282_pod_anti_affinity_conflict%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252082_pod_anti_affinity_conflict%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 55.7s / πŸ’° $0.15 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252282_pod_anti_affinity_conflict%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252082_pod_anti_affinity_conflict%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 69.4s / πŸ’° $0.16 | +| [83_secret_not_found](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/83_secret_not_found/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252283_secret_not_found%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252083_secret_not_found%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252283_secret_not_found%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252083_secret_not_found%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 30.9s / πŸ’° $0.13 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252283_secret_not_found%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252083_secret_not_found%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 34.6s / πŸ’° $0.08 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252283_secret_not_found%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252083_secret_not_found%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 108.2s / πŸ’° $0.09 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252283_secret_not_found%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252083_secret_not_found%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 44.0s / πŸ’° $0.11 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252283_secret_not_found%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252083_secret_not_found%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 50.6s / πŸ’° $0.12 | +| [84_network_policy_blocking_traffic](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/84_network_policy_blocking_traffic/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252284_network_policy_blocking_traffic%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252084_network_policy_blocking_traffic%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252284_network_policy_blocking_traffic%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252084_network_policy_blocking_traffic%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 42.8s / πŸ’° $0.24 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252284_network_policy_blocking_traffic%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252084_network_policy_blocking_traffic%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 71.2s / πŸ’° $0.43 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252284_network_policy_blocking_traffic%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252084_network_policy_blocking_traffic%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 182.7s / πŸ’° $0.20 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252284_network_policy_blocking_traffic%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252084_network_policy_blocking_traffic%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 71.5s / πŸ’° $0.20 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252284_network_policy_blocking_traffic%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252084_network_policy_blocking_traffic%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 75.4s / πŸ’° $0.22 | +| [85_hpa_not_scaling](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/85_hpa_not_scaling/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252285_hpa_not_scaling%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252085_hpa_not_scaling%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252285_hpa_not_scaling%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252085_hpa_not_scaling%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 33.3s / πŸ’° $0.11 | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252285_hpa_not_scaling%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252085_hpa_not_scaling%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 61.4s / πŸ’° $0.12 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252285_hpa_not_scaling%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252085_hpa_not_scaling%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 168.9s / πŸ’° $0.14 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252285_hpa_not_scaling%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252085_hpa_not_scaling%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 52.0s / πŸ’° $0.17 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252285_hpa_not_scaling%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252085_hpa_not_scaling%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 62.4s / πŸ’° $0.23 | +| [86_configmap_like_but_secret](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/86_configmap_like_but_secret/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252286_configmap_like_but_secret%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252086_configmap_like_but_secret%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252286_configmap_like_but_secret%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252086_configmap_like_but_secret%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 30.5s / πŸ’° $0.10 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252286_configmap_like_but_secret%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252086_configmap_like_but_secret%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 43.7s / πŸ’° $0.14 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252286_configmap_like_but_secret%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252086_configmap_like_but_secret%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 566.6s / πŸ’° $0.18 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252286_configmap_like_but_secret%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252086_configmap_like_but_secret%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 56.7s / πŸ’° $0.12 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252286_configmap_like_but_secret%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252086_configmap_like_but_secret%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 64.3s / πŸ’° $0.15 | +| [89_runbook_missing_cloudwatch](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/89_runbook_missing_cloudwatch/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252289_runbook_missing_cloudwatch%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252089_runbook_missing_cloudwatch%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252289_runbook_missing_cloudwatch%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252089_runbook_missing_cloudwatch%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 26.7s / πŸ’° $0.08 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252289_runbook_missing_cloudwatch%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252089_runbook_missing_cloudwatch%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 17.6s / πŸ’° $0.04 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252289_runbook_missing_cloudwatch%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252089_runbook_missing_cloudwatch%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 241.4s / πŸ’° $0.16 | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252289_runbook_missing_cloudwatch%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252089_runbook_missing_cloudwatch%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 99.2s / πŸ’° $0.29 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252289_runbook_missing_cloudwatch%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252089_runbook_missing_cloudwatch%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 31.3s / πŸ’° $0.08 | +| [90_runbook_basic_selection](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/90_runbook_basic_selection/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252290_runbook_basic_selection%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252090_runbook_basic_selection%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252290_runbook_basic_selection%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252090_runbook_basic_selection%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 47.9s / πŸ’° $0.18 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252290_runbook_basic_selection%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252090_runbook_basic_selection%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 133.3s / πŸ’° $0.14 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252290_runbook_basic_selection%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252090_runbook_basic_selection%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 386.5s / πŸ’° $0.21 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252290_runbook_basic_selection%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252090_runbook_basic_selection%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 103.6s / πŸ’° $0.31 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252290_runbook_basic_selection%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252090_runbook_basic_selection%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 111.4s / πŸ’° $0.35 | +| [91f_datadog_logs_historical_pod](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/91f_datadog_logs_historical_pod/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252291f_datadog_logs_historical_pod%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252091f_datadog_logs_historical_pod%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252291f_datadog_logs_historical_pod%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252091f_datadog_logs_historical_pod%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 8.0s | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252291f_datadog_logs_historical_pod%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252091f_datadog_logs_historical_pod%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 7.7s | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252291f_datadog_logs_historical_pod%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252091f_datadog_logs_historical_pod%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 9.0s | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252291f_datadog_logs_historical_pod%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252091f_datadog_logs_historical_pod%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 7.8s | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252291f_datadog_logs_historical_pod%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252091f_datadog_logs_historical_pod%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 8.5s | +| [93_calling_datadog[0]](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/93_calling_datadog[0]/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252293_calling_datadog%255B0%255D%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252093_calling_datadog%255B0%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252293_calling_datadog%255B0%255D%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252093_calling_datadog%255B0%255D%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 73.6s / πŸ’° $0.12 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252293_calling_datadog%255B0%255D%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252093_calling_datadog%255B0%255D%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 10.5s / πŸ’° $0.07 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252293_calling_datadog%255B0%255D%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252093_calling_datadog%255B0%255D%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 50.9s / πŸ’° $0.07 | [⏱️ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252293_calling_datadog%255B0%255D%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252093_calling_datadog%255B0%255D%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 608.3s | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252293_calling_datadog%255B0%255D%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252093_calling_datadog%255B0%255D%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 12.0s / πŸ’° $0.15 | +| [93_calling_datadog[1]](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/93_calling_datadog[1]/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252293_calling_datadog%255B1%255D%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252093_calling_datadog%255B1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252293_calling_datadog%255B1%255D%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252093_calling_datadog%255B1%255D%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 74.2s / πŸ’° $0.12 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252293_calling_datadog%255B1%255D%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252093_calling_datadog%255B1%255D%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 27.4s / πŸ’° $0.11 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252293_calling_datadog%255B1%255D%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252093_calling_datadog%255B1%255D%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 78.9s / πŸ’° $0.12 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252293_calling_datadog%255B1%255D%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252093_calling_datadog%255B1%255D%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 12.1s / πŸ’° $0.15 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252293_calling_datadog%255B1%255D%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252093_calling_datadog%255B1%255D%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 9.6s / πŸ’° $0.15 | +| [93_calling_datadog[2]](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/93_calling_datadog[2]/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252293_calling_datadog%255B2%255D%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252093_calling_datadog%255B2%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252293_calling_datadog%255B2%255D%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252093_calling_datadog%255B2%255D%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 65.2s / πŸ’° $0.12 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252293_calling_datadog%255B2%255D%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252093_calling_datadog%255B2%255D%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 15.1s / πŸ’° $0.11 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252293_calling_datadog%255B2%255D%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252093_calling_datadog%255B2%255D%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 33.1s / πŸ’° $0.06 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252293_calling_datadog%255B2%255D%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252093_calling_datadog%255B2%255D%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 11.4s / πŸ’° $0.15 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252293_calling_datadog%255B2%255D%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252093_calling_datadog%255B2%255D%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 12.1s / πŸ’° $0.15 | +| [93_events_since_specific_date](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/93_events_since_specific_date/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252293_events_since_specific_date%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252093_events_since_specific_date%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252293_events_since_specific_date%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252093_events_since_specific_date%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 13.0s / πŸ’° $0.09 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252293_events_since_specific_date%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252093_events_since_specific_date%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 19.1s / πŸ’° $0.07 | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252293_events_since_specific_date%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252093_events_since_specific_date%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252293_events_since_specific_date%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252093_events_since_specific_date%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 20.1s / πŸ’° $0.11 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252293_events_since_specific_date%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252093_events_since_specific_date%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 14.8s / πŸ’° $0.10 | +| [94_runbook_transparency](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/94_runbook_transparency/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252294_runbook_transparency%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252094_runbook_transparency%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252294_runbook_transparency%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252094_runbook_transparency%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 54.1s / πŸ’° $0.33 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252294_runbook_transparency%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252094_runbook_transparency%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 47.1s / πŸ’° $0.12 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252294_runbook_transparency%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252094_runbook_transparency%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 308.9s / πŸ’° $0.21 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252294_runbook_transparency%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252094_runbook_transparency%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 91.4s / πŸ’° $0.25 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252294_runbook_transparency%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252094_runbook_transparency%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 104.5s / πŸ’° $0.21 | +| [96_no_matching_runbook](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/96_no_matching_runbook/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252296_no_matching_runbook%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252096_no_matching_runbook%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252296_no_matching_runbook%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252096_no_matching_runbook%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 52.1s / πŸ’° $0.23 | [πŸ”΄ 0% (0/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252296_no_matching_runbook%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252096_no_matching_runbook%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 42.3s / πŸ’° $0.11 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252296_no_matching_runbook%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252096_no_matching_runbook%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 275.9s / πŸ’° $0.18 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252296_no_matching_runbook%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252096_no_matching_runbook%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 96.7s / πŸ’° $0.31 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252296_no_matching_runbook%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252096_no_matching_runbook%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 86.8s / πŸ’° $0.36 | +| [97_logs_clarification_needed](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/97_logs_clarification_needed/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252297_logs_clarification_needed%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252097_logs_clarification_needed%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252297_logs_clarification_needed%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252097_logs_clarification_needed%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 11.1s / πŸ’° $0.03 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252297_logs_clarification_needed%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252097_logs_clarification_needed%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 15.9s / πŸ’° $0.03 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252297_logs_clarification_needed%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252097_logs_clarification_needed%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 25.6s / πŸ’° $0.02 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252297_logs_clarification_needed%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252097_logs_clarification_needed%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 37.7s / πŸ’° $0.18 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252297_logs_clarification_needed%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252097_logs_clarification_needed%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 15.9s / πŸ’° $0.05 | +| [98_logs_transparency_default_time](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/98_logs_transparency_default_time/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252298_logs_transparency_default_time%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252098_logs_transparency_default_time%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252298_logs_transparency_default_time%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252098_logs_transparency_default_time%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252298_logs_transparency_default_time%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252098_logs_transparency_default_time%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252298_logs_transparency_default_time%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252098_logs_transparency_default_time%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252298_logs_transparency_default_time%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252098_logs_transparency_default_time%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [βšͺ️ -](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252298_logs_transparency_default_time%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252098_logs_transparency_default_time%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | +| [99_logs_transparency_custom_time](https://github.com/robusta-dev/holmesgpt/blob/master/tests/llm/fixtures/test_ask_holmes/99_logs_transparency_custom_time/test_case.yaml) [πŸ”—](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22metadata.eval_id%2520%253D%2520%252299_logs_transparency_custom_time%2522%22%2C%20%22label%22%3A%20%22metadata.eval_id%2520equals%252099_logs_transparency_custom_time%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252299_logs_transparency_custom_time%255Bgpt-4o%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252099_logs_transparency_custom_time%255Bgpt-4o%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 36.7s / πŸ’° $0.15 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252299_logs_transparency_custom_time%255Bgpt-4.1%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252099_logs_transparency_custom_time%255Bgpt-4.1%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 35.5s / πŸ’° $0.10 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252299_logs_transparency_custom_time%255Bgpt-5%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252099_logs_transparency_custom_time%255Bgpt-5%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 79.7s / πŸ’° $0.07 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252299_logs_transparency_custom_time%255Banthropic%252Fclaude-sonnet-4-20250514%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252099_logs_transparency_custom_time%255Banthropic%252Fclaude-sonnet-4-20250514%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 42.4s / πŸ’° $0.12 | [🟒 100% (1/1)](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241?c=&search=%7B%22filter%22%3A%20%5B%7B%22text%22%3A%20%22span_attributes.name%2520%253D%2520%252299_logs_transparency_custom_time%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%2522%22%2C%20%22label%22%3A%20%22Name%2520equals%252099_logs_transparency_custom_time%255Banthropic%252Fclaude-sonnet-4-5-20250929%255D%22%2C%20%22originType%22%3A%20%22form%22%7D%5D%7D) / ⏱️ 66.6s / πŸ’° $0.12 | --- -*Results are automatically generated and updated weekly. View full traces and detailed analysis in [Braintrust experiment: local-benchmark-20250930-092035](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20250930-092035).* +*Results are automatically generated and updated weekly. View full traces and detailed analysis in [Braintrust experiment: local-benchmark-20251007-155241](https://www.braintrust.dev/app/robustadev/p/HolmesGPT/experiments/local-benchmark-20251007-155241).*