Improve Risk Agent with Risk Report and better criteria generation prompt #298

NISH1001 · 2025-12-18T20:00:42Z

Summary 📝

This PR upgrades the RiskAgent guardrail into a comprehensive safety diagnostic tool. Instead of returning a binary pass/fail or a simple score, the agent now performs per-risk detection and leverages a secondary RiskReportAgent to generate detailed, technically grounded reports when violations occur. This ensures that safety failures are actionable and backed by specific evidence from the evaluated content.

Details ⚙️

Active Harm Detection Prompting:

Refactored RISK_SYSTEM_PROMPT to prioritize "Presence of Risk" over "Absence of Safety."
Criteria are now designed to be unforgiving and failure-oriented, ensuring that subtle scientific or technical risks are not overlooked.

Automated Risk Reporting:

Integrated a new internal _RiskReportAgent that parses failed High Importance criteria.
The agent maps specific evidence (verbatim quotes/indices) from the risky_content to the risk definition, providing a structured "Criterion-Evidence-Analysis" report.

Adaptive DAG Verdict Logic:

Implemented dynamic bucket generation for the weighted_ratio. The number of verdict buckets now scales with the number of risks evaluated (max(5, num_risks + 4)), providing more granular scoring resolution.

Resilient Verdict Extraction:

Introduced a dual-layered extraction strategy in risk_agent.py.
Primary: Direct extraction from DeepEval TaskNode._output.
Fallback: A robust Regex-based parser that scans verbose_logs to reconstruct verdicts if node states are lost during DAG execution.

Type-Safe Refactoring:

Updated the internal configuration and DAG builder to use RiskCategory enums consistently, improving maintainability and reducing string-parsing errors.

Bugfixes 🐛

Node State Persistence: Resolved an issue where DeepEval's DAGMetric would occasionally fail to populate outputs on original node references, leading to "False Pass" scenarios.
Weight Resolution: Fixed a bug in weight mapping where unspecified risks would default to zero weight; they now correctly default to 1.0.

Checks

Closed Risk eval in guardrails.py and new "agent" that generates risk report for failed risks #249
Tested Changes (549 tests passed)
Stakeholder Approval

- Parse verbose_logs to determine per-risk pass/fail (fixes node._output issue) - Extract failed criteria from verbose_logs for report generation - Add internal _RiskReportAgent for generating risk reports - Add risk_report to GuardrailOutput.extra - Add risk_report_config to RiskAgentConfig for configurability Co-Authored-By: Tigran Tchrakian <45388254+TigranTigranTigran@users.noreply.github.com>

github-actions · 2025-12-18T20:09:41Z

✅ Tests passed

📊 Test Results

Passed: 549
Failed: 0
Skipped: 23
Warnings: 133
Coverage: 78%

Branch: enhance/risk-agent-guardrail
PR: #298
Commit: f3225b8

📋 Full coverage report and logs are available in the workflow run.

muthukumaranR

Thanks for working on the risk report. Lgtm

github-actions · 2025-12-18T22:04:31Z

✅ Tests passed

📊 Test Results

Passed: 549
Failed: 0
Skipped: 23
Warnings: 134
Coverage: 77%

Branch: enhance/risk-agent-guardrail
PR: #298
Commit: e10fea2

📋 Full coverage report and logs are available in the workflow run.

github-actions · 2025-12-18T22:45:59Z

✅ Tests passed

📊 Test Results

Passed: 549
Failed: 0
Skipped: 23
Warnings: 134
Coverage: 78%

Branch: enhance/risk-agent-guardrail
PR: #298
Commit: ee9fb73

📋 Full coverage report and logs are available in the workflow run.

github-actions · 2025-12-19T01:03:14Z

✅ Tests passed

📊 Test Results

Passed: 549
Failed: 0
Skipped: 23
Warnings: 134
Coverage: 78%

Branch: enhance/risk-agent-guardrail
PR: #298
Commit: 1a052fb

📋 Full coverage report and logs are available in the workflow run.

github-actions · 2025-12-19T01:10:52Z

✅ Tests passed

📊 Test Results

Passed: 549
Failed: 0
Skipped: 23
Warnings: 132
Coverage: 78%

Branch: enhance/risk-agent-guardrail
PR: #298
Commit: d11357f

📋 Full coverage report and logs are available in the workflow run.

NISH1001 · 2025-12-19T01:22:54Z

Note:

I haven't included _confirm_risk_score() method from Risk eval in guardrails.py and new "agent" that generates risk report for failed risks #249 because now in this PR the verdicts and criteria are all generated from DAG metric instead of parsing verbose logs. So there's no need to recompute/revalidate.

NISH1001 and others added 2 commits December 18, 2025 11:23

Update dag-metric llm and prompt

870417c

NISH1001 had a problem deploying to integration December 18, 2025 20:00 — with GitHub Actions Failure

NISH1001 requested a review from muthukumaranR December 18, 2025 20:00

NISH1001 mentioned this pull request Dec 18, 2025

Risk eval in guardrails.py and new "agent" that generates risk report for failed risks #249

Closed

3 tasks

muthukumaranR approved these changes Dec 18, 2025

View reviewed changes

Improve verdict tracing in risk agent

1bb2331

NISH1001 had a problem deploying to integration December 18, 2025 21:56 — with GitHub Actions Failure

Refactor risk agent flow

fcda86d

NISH1001 had a problem deploying to integration December 18, 2025 22:35 — with GitHub Actions Failure

Refactor _arun metho in risk agent to break down into methods

6e39143

NISH1001 had a problem deploying to integration December 19, 2025 00:54 — with GitHub Actions Failure

Bugfix criteria serialzation in risk_results in risk agent

9185902

NISH1001 temporarily deployed to integration December 19, 2025 01:00 — with GitHub Actions Inactive

NISH1001 merged commit 9d948b0 into develop Dec 19, 2025
1 check passed

NISH1001 deleted the enhance/risk-agent-guardrail branch December 19, 2025 01:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve Risk Agent with Risk Report and better criteria generation prompt #298

Improve Risk Agent with Risk Report and better criteria generation prompt #298

Uh oh!

NISH1001 commented Dec 18, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Dec 18, 2025

Uh oh!

muthukumaranR left a comment

Uh oh!

github-actions bot commented Dec 18, 2025

Uh oh!

github-actions bot commented Dec 18, 2025

Uh oh!

github-actions bot commented Dec 19, 2025

Uh oh!

github-actions bot commented Dec 19, 2025

Uh oh!

Uh oh!

NISH1001 commented Dec 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Improve Risk Agent with Risk Report and better criteria generation prompt #298

Improve Risk Agent with Risk Report and better criteria generation prompt #298

Uh oh!

Conversation

NISH1001 commented Dec 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary 📝

Details ⚙️

Bugfixes 🐛

Checks

Uh oh!

github-actions bot commented Dec 18, 2025

📊 Test Results

Uh oh!

muthukumaranR left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Dec 18, 2025

📊 Test Results

Uh oh!

github-actions bot commented Dec 18, 2025

📊 Test Results

Uh oh!

github-actions bot commented Dec 19, 2025

📊 Test Results

Uh oh!

github-actions bot commented Dec 19, 2025

📊 Test Results

Uh oh!

Uh oh!

NISH1001 commented Dec 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

NISH1001 commented Dec 18, 2025 •

edited

Loading