Skip to content

Conversation

@NISH1001
Copy link
Collaborator

@NISH1001 NISH1001 commented Dec 18, 2025

Summary 📝

This PR upgrades the RiskAgent guardrail into a comprehensive safety diagnostic tool. Instead of returning a binary pass/fail or a simple score, the agent now performs per-risk detection and leverages a secondary RiskReportAgent to generate detailed, technically grounded reports when violations occur. This ensures that safety failures are actionable and backed by specific evidence from the evaluated content.

Details ⚙️

  1. Active Harm Detection Prompting:
  • Refactored RISK_SYSTEM_PROMPT to prioritize "Presence of Risk" over "Absence of Safety."
  • Criteria are now designed to be unforgiving and failure-oriented, ensuring that subtle scientific or technical risks are not overlooked.
  1. Automated Risk Reporting:
  • Integrated a new internal _RiskReportAgent that parses failed High Importance criteria.
  • The agent maps specific evidence (verbatim quotes/indices) from the risky_content to the risk definition, providing a structured "Criterion-Evidence-Analysis" report.
  1. Adaptive DAG Verdict Logic:
  • Implemented dynamic bucket generation for the weighted_ratio. The number of verdict buckets now scales with the number of risks evaluated (max(5, num_risks + 4)), providing more granular scoring resolution.
  1. Resilient Verdict Extraction:
  • Introduced a dual-layered extraction strategy in risk_agent.py.
  • Primary: Direct extraction from DeepEval TaskNode._output.
  • Fallback: A robust Regex-based parser that scans verbose_logs to reconstruct verdicts if node states are lost during DAG execution.
  1. Type-Safe Refactoring:
  • Updated the internal configuration and DAG builder to use RiskCategory enums consistently, improving maintainability and reducing string-parsing errors.

Bugfixes 🐛

  • Node State Persistence: Resolved an issue where DeepEval's DAGMetric would occasionally fail to populate outputs on original node references, leading to "False Pass" scenarios.
  • Weight Resolution: Fixed a bug in weight mapping where unspecified risks would default to zero weight; they now correctly default to 1.0.

Checks


NISH1001 and others added 2 commits December 18, 2025 11:23
- Parse verbose_logs to determine per-risk pass/fail (fixes node._output issue)
- Extract failed criteria from verbose_logs for report generation
- Add internal _RiskReportAgent for generating risk reports
- Add risk_report to GuardrailOutput.extra
- Add risk_report_config to RiskAgentConfig for configurability

Co-Authored-By: Tigran Tchrakian <45388254+TigranTigranTigran@users.noreply.github.com>
@github-actions
Copy link

✅ Tests passed

📊 Test Results

  • Passed: 549
  • Failed: 0
  • Skipped: 23
  • Warnings: 133
  • Coverage: 78%

Branch: enhance/risk-agent-guardrail
PR: #298
Commit: f3225b8

📋 Full coverage report and logs are available in the workflow run.

Copy link
Collaborator

@muthukumaranR muthukumaranR left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on the risk report. Lgtm

@github-actions
Copy link

✅ Tests passed

📊 Test Results

  • Passed: 549
  • Failed: 0
  • Skipped: 23
  • Warnings: 134
  • Coverage: 77%

Branch: enhance/risk-agent-guardrail
PR: #298
Commit: e10fea2

📋 Full coverage report and logs are available in the workflow run.

@github-actions
Copy link

✅ Tests passed

📊 Test Results

  • Passed: 549
  • Failed: 0
  • Skipped: 23
  • Warnings: 134
  • Coverage: 78%

Branch: enhance/risk-agent-guardrail
PR: #298
Commit: ee9fb73

📋 Full coverage report and logs are available in the workflow run.

@github-actions
Copy link

✅ Tests passed

📊 Test Results

  • Passed: 549
  • Failed: 0
  • Skipped: 23
  • Warnings: 134
  • Coverage: 78%

Branch: enhance/risk-agent-guardrail
PR: #298
Commit: 1a052fb

📋 Full coverage report and logs are available in the workflow run.

@github-actions
Copy link

✅ Tests passed

📊 Test Results

  • Passed: 549
  • Failed: 0
  • Skipped: 23
  • Warnings: 132
  • Coverage: 78%

Branch: enhance/risk-agent-guardrail
PR: #298
Commit: d11357f

📋 Full coverage report and logs are available in the workflow run.

@NISH1001 NISH1001 merged commit 9d948b0 into develop Dec 19, 2025
1 check passed
@NISH1001 NISH1001 deleted the enhance/risk-agent-guardrail branch December 19, 2025 01:11
@NISH1001
Copy link
Collaborator Author

Note:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants