chore(evalhub): update garak provider configmap with supported benchmarks by saichandrapandraju · Pull Request #654 · trustyai-explainability/trustyai-service-operator

saichandrapandraju · 2026-03-02T04:50:11Z

This PR updates the evalhub garak provider configmap to have supported pre-defined benchmarks by garak evalhub adapter

Summary by CodeRabbit

New Features
- Expanded benchmark suite replacing a single toxicity test with multiple evaluations (OWASP LLM Top 10, AVID variants, CWE, Quality, Quick Scan) covering security, safety, performance, and ethics.
- Standardized on a metrics-centric evaluation using a consistent attack_success_rate metric.
Refactor
- Reorganized benchmark taxonomy, categories, and tags; removed legacy benchmark fields for a streamlined configuration.

openshift-ci · 2026-03-02T04:50:15Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

coderabbitai · 2026-03-02T04:50:18Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a884fc9 and 046a147.

📒 Files selected for processing (1)

config/configmaps/evalhub/provider-garak.yaml

📝 Walkthrough

Walkthrough

Replaces the Garak provider container entrypoint to run a Python module and restructures the evaluation benchmarks in the provider config from a single toxicity benchmark to a taxonomy-aligned suite (OWASP, AVID variants, CWE, quality/quick) with standardized metrics and tags.

Changes

Cohort / File(s)	Summary
Garak Provider Configuration `config/configmaps/evalhub/provider-garak.yaml`	Changed entrypoint from running `main.py` to module invocation (`-m llama_stack_provider_trustyai_garak.evalhub`). Replaced the single `toxicity` benchmark with multiple taxonomy-aligned benchmarks (owasp_llm_top10, avid, avid_security, avid_ethics, avid_performance, quality, cwe, quick). Standardized metrics to `attack_success_rate`, removed legacy fields (`num_few_shot`, `dataset_size`), and updated categories and tags (security, safety, performance, avid, red_team, ethics, cwe, quick).

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

🐰 I swapped one test for many hops and bounds,
OWASP, AVID, CWE in tidy rounds.
Metrics aligned, tags all in view,
A burrow of checks—fresh, precise, and new.
Nibble the config, let the audits pursue! 🥕

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main change: updating the garak provider configmap with supported benchmarks, which aligns with the primary objective of the PR.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

openshift-ci · 2026-03-02T17:00:45Z

@saichandrapandraju: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/trustyai-service-operator-e2e	`046a147`	link	true	`/test trustyai-service-operator-e2e`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

config/configmaps/evalhub/provider-garak.yaml

openshift-ci · 2026-03-02T17:52:42Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: ruivieira

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

update garak provider configmap with supported benchmarks

a884fc9

openshift-ci bot added the do-not-merge/work-in-progress label Mar 2, 2026

saichandrapandraju marked this pull request as ready for review March 2, 2026 04:53

openshift-ci bot removed the do-not-merge/work-in-progress label Mar 2, 2026

ruivieira self-requested a review March 2, 2026 12:11

ruivieira assigned saichandrapandraju Mar 2, 2026

ruivieira added the kind/enhancement New feature or request label Mar 2, 2026

ruivieira added this to TrustyAI planning Mar 2, 2026

ruivieira moved this to In Progress in TrustyAI planning Mar 2, 2026

ruivieira moved this from In Progress to In Review in TrustyAI planning Mar 2, 2026

rename garak benchmarks from trustyai_garak::x -> x

046a147

ruivieira approved these changes Mar 2, 2026

View reviewed changes

config/configmaps/evalhub/provider-garak.yaml Outdated Show resolved Hide resolved

openshift-ci bot assigned ruivieira Mar 2, 2026

openshift-ci bot added the lgtm label Mar 2, 2026

ruivieira merged commit 3fd576a into trustyai-explainability:main Mar 2, 2026
7 of 9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore(evalhub): update garak provider configmap with supported benchmarks#654

chore(evalhub): update garak provider configmap with supported benchmarks#654
ruivieira merged 2 commits intotrustyai-explainability:mainfrom
saichandrapandraju:garak-config-update

saichandrapandraju commented Mar 2, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

openshift-ci bot commented Mar 2, 2026

Uh oh!

coderabbitai bot commented Mar 2, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

openshift-ci bot commented Mar 2, 2026

Uh oh!

Uh oh!

openshift-ci bot commented Mar 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

saichandrapandraju commented Mar 2, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

openshift-ci bot commented Mar 2, 2026

Uh oh!

coderabbitai bot commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

openshift-ci bot commented Mar 2, 2026

Uh oh!

Uh oh!

openshift-ci bot commented Mar 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

saichandrapandraju commented Mar 2, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 2, 2026 •

edited

Loading