Goal: Transform MLOps artifacts into Regulatory Evidence with a GovOps layer.
Prerequisite: Level 1 (The Engineer)
Context: Continuing with "The Project" (Loan Credit Scoring).
In Level 1, you fixed the bias locally. But your manager denies it because they can't see the proof. Emails with screenshots are not compliance.
In GovOps (Assurance over MLOps), we don't treat compliance as a separate manual step. Instead, we use your existing MLOps infrastructure (MLflow, WandB) as an Evidence Buffer that automatically harvests the proof of safety during the training process.
In a professional pipeline, assurance is a layer that wraps your training. Every time you train a model, you verify its compliance.
Your experiment tracker now tracks two types of performance: Accuracy (Operational) and Compliance (Regulatory).
💡 Full Code: You can find the complete, ready-to-run script for this level here: 03_mlops_integration.py
=== "MLflow"
```python
import mlflow
import venturalitica as vl
from venturalitica.quickstart import load_sample
from dataclasses import asdict
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
mlflow.set_tracking_uri("sqlite:///mlflow.db")
mlflow.set_experiment("loan-credit-scoring")
# 0. Data Preparation
df = load_sample("loan")
X = df.select_dtypes(include=['number']).drop(columns=['class'])
y = df['class']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# 1. Start the GovOps Session (Implicitly captures 'Audit Trace')
with mlflow.start_run(), vl.monitor("train_v1"):
# 2. Pre-training Data Audit (Article 10)
vl.enforce(
data=df,
target="class",
gender="Attribute9",
policy="data_policy.oscal.yaml"
)
# 3. Train your model
model = LogisticRegression()
model.fit(X_train, y_train)
# 4. Post-training Model Audit (Article 15: Human Oversight)
# Download model_policy.oscal.yaml: https://github.com/venturalitica/venturalitica-sdk-samples/blob/main/scenarios/loan-credit-scoring/policies/loan/model_policy.oscal.yaml
results = vl.enforce(
data=X_test.assign(prediction=model.predict(X_test)),
target="prediction", # 🧠 Checking Model Behavior
gender="gender",
policy="model_policy.oscal.yaml" # 🗝️ New policy for Model Assurance
)
# 5. Log everything to the Evidence Buffer
passed = all(r.passed for r in results)
mlflow.log_metric("val_accuracy", 0.92)
mlflow.log_metric("compliance_score", 1.0 if passed else 0.0)
mlflow.log_dict([asdict(r) for r in results], "compliance_results.json")
if not passed:
# 🛑 CRITICAL: Block the pipeline if the model is unethical
raise ValueError("Model failed ISO 42001 compliance check. See audit trace.")
```
> **Note**: `vl.monitor()` now captures **Multimodal Evidence**: hardware/carbon metrics AND the logical execution trace (AST code story).
=== "Weights & Biases"
```python
import wandb
import venturalitica as vl
from venturalitica.quickstart import load_sample
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
wandb.init(project="loan-credit-scoring")
# 0. Data Preparation
df = load_sample("loan")
X = df.select_dtypes(include=['number']).drop(columns=['class'])
y = df['class']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# 1. Open a Monitor Context
with vl.monitor("wandb_sync"):
# Pre-training Audit (Article 10)
vl.enforce(data=df, policy="data_policy.oscal.yaml", target="class")
# 2. Train and Audit
model = LogisticRegression(max_iter=1000)
model.fit(X_train, y_train)
# Post-training Audit (Article 15)
test_df = X_test.copy()
test_df["class"] = y_test
test_df["prediction"] = model.predict(X_test)
# Download model_policy.oscal.yaml: https://github.com/venturalitica/venturalitica-sdk-samples/blob/main/scenarios/loan-credit-scoring/policies/loan/model_policy.oscal.yaml
audit = vl.enforce(
data=test_df,
target="class",
prediction="prediction",
gender="Attribute9",
policy="model_policy.oscal.yaml"
)
# 3. Log Compliance Artifacts
artifact = wandb.Artifact('compliance-bundle', type='evidence')
artifact.add_file(".venturalitica/results.json")
wandb.log_artifact(artifact)
passed = all(r.passed for r in audit)
wandb.log({"accuracy": model.score(X_test, y_test), "compliance": 1.0 if passed else 0.0})
if not passed:
raise ValueError("Model rejected by GovOps policy.")
```
Now that the code has run, let's verify what we shipped.
- Run the UI:
pip install venturalitica[dashboard] # Required for the UI venturalitica ui - Log Check: Verify that
.venturalitica/results.jsonexists (this is the default output ofenforce). - Navigate to "Policy Status": Confirm your "Risk Treatment" (the adjusted threshold) is recorded.
Key Insight: "The report looks professional, and I didn't write a single word of it."
Professional GovOps requires a separation of concerns. You are now managing two distinct assurance layers:
- Level 1 (Article 10): Checked the Raw Data against
data_policy.yaml. The goal was to prove the dataset itself was fair before wasting energy on training. - Level 2 (Article 15): Checks the Model Behavior against
model_policy.yaml. The goal is to prove the AI makes fair decisions in a "Glass Box" execution.
| Stage | Variable Mapping | Policy File | Mandatory Requirement |
|---|---|---|---|
| Data Audit | target="class" |
data_policy.oscal.yaml | Article 10 (Data Assurance) |
| Model Audit | target="prediction" |
model_policy.oscal.yaml | Article 15 (Human Oversight) |
This decoupling is the core of the Handshake. Even if the Law (> 0.5) stays the same, the subject of the law changes from Data to Math.
If compliance_score == 0, the build fails.
GitLab CI / GitHub Actions can now block a deployment based on ethics, just like they block on syntax errors.
- GovOps is Native: Assurance isn't an extra step; it's a context manager (
vl.monitor) around your training. - Telemetry is Evidence: RAM, CO2, and Trace results are not just for metrics—they fulfill Article 15 oversight.
- Unified Trace:
vl.monitor()captures everything from hardware usage to AST code analysis in a single.jsonfile. - Zero Friction: The Data Scientist continues to use MLflow/WandB, while the SDK harvests the evidence.
- API Reference --
enforce()andmonitor()signatures - Policy Authoring -- How to write OSCAL policies
- Probes Reference -- What
monitor()captures automatically - Column Binding -- How
gender="Attribute9"works
