-
Notifications
You must be signed in to change notification settings - Fork 3
Improve Risk Agent with Risk Report and better criteria generation prompt #298
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Parse verbose_logs to determine per-risk pass/fail (fixes node._output issue) - Extract failed criteria from verbose_logs for report generation - Add internal _RiskReportAgent for generating risk reports - Add risk_report to GuardrailOutput.extra - Add risk_report_config to RiskAgentConfig for configurability Co-Authored-By: Tigran Tchrakian <45388254+TigranTigranTigran@users.noreply.github.com>
|
✅ Tests passed 📊 Test Results
Branch: 📋 Full coverage report and logs are available in the workflow run. |
muthukumaranR
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on the risk report. Lgtm
|
✅ Tests passed 📊 Test Results
Branch: 📋 Full coverage report and logs are available in the workflow run. |
|
✅ Tests passed 📊 Test Results
Branch: 📋 Full coverage report and logs are available in the workflow run. |
|
✅ Tests passed 📊 Test Results
Branch: 📋 Full coverage report and logs are available in the workflow run. |
|
✅ Tests passed 📊 Test Results
Branch: 📋 Full coverage report and logs are available in the workflow run. |
|
Note:
|
Summary 📝
This PR upgrades the
RiskAgentguardrail into a comprehensive safety diagnostic tool. Instead of returning a binary pass/fail or a simple score, the agent now performs per-risk detection and leverages a secondaryRiskReportAgentto generate detailed, technically grounded reports when violations occur. This ensures that safety failures are actionable and backed by specific evidence from the evaluated content.Details ⚙️
RISK_SYSTEM_PROMPTto prioritize "Presence of Risk" over "Absence of Safety."_RiskReportAgentthat parses failed High Importance criteria.risky_contentto the risk definition, providing a structured "Criterion-Evidence-Analysis" report.weighted_ratio. The number of verdict buckets now scales with the number of risks evaluated (max(5, num_risks + 4)), providing more granular scoring resolution.risk_agent.py.DeepEvalTaskNode._output.verbose_logsto reconstruct verdicts if node states are lost during DAG execution.RiskCategoryenums consistently, improving maintainability and reducing string-parsing errors.Bugfixes 🐛
DeepEval'sDAGMetricwould occasionally fail to populate outputs on original node references, leading to "False Pass" scenarios.1.0.Checks