-
Notifications
You must be signed in to change notification settings - Fork 30
Description
Blog Post Submission
Post Type
- Deep Dive
- How-To
- Use Case
- Tips / Best Practices
- Features
Topics
- GenAI
- Advanced
- Deployment
- Core
Title
Deterministic Safety Scoring in MLflow: Integrating Guardrails AI Validators
Abstract
MLflow serves approximately 29 million downloads per month (PyPI Stats, Feb 2026). This post documents the Guardrails AI integration I built for MLflow, which shipped in MLflow 3.10.0.
Building on the scorer pattern designed by @smoorjani (DeepEval/RAGAS), I extended it to support a new category - deterministic validators that don't require LLM calls:
- Deterministic evaluation - no LLM calls required, repeatable outcomes for compliance
- Cost and latency efficiency - no token costs, millisecond execution
- Available validators - ToxicLanguage, NSFWText, DetectPII, DetectJailbreak, SecretsPresent, GibberishText
- Hybrid patterns - combining deterministic and LLM-based evaluation in a single pipeline
Guardrails AI has 5,000+ GitHub stars. This integration brings their validators directly into MLflow's evaluation workflow.
Target Length
~1500 words (medium-length deep dive)
Related Artifacts
- PR: #20038 (Guardrails AI integration, +761 lines)
- Release: MLflow 3.10.0
- Original pattern: DeepEval/RAGAS by @smoorjani
Provenance
- Original scorer pattern: @smoorjani
- Guardrails integration implementation: Debu Sinha (@debu-sinha)
- Code review: @smoorjani, @B-Step62
Consent Acknowledgment
- Guardrails AI maintainer @zayd-simjee has acknowledged the integration
Additional Context
This post differentiates from "yet another LLM-judge" by focusing on deterministic validation - a distinct category that required adapting the scorer pattern for non-LLM outputs.