Skip to content

Latest commit

 

History

History
339 lines (266 loc) · 9.3 KB

File metadata and controls

339 lines (266 loc) · 9.3 KB

Full Lifecycle: Zero to Annex IV

A single-page walkthrough of the entire Venturalitica compliance lifecycle using the Loan Credit Scoring scenario. Each step is copy-paste ready.

Time: ~15 minutes. Prerequisites: Python 3.9+, pip install venturalitica.


Overview

Step 1         Step 2           Step 3          Step 4           Step 5
Install   -->  Write Policy --> Audit Data  --> Audit Model  --> Generate Report
pip install    OSCAL YAML       enforce()       enforce()        Dashboard Annex IV

EU AI Act mapping:

Step Article Purpose
Write Policy Art 9 (Risk Management) Define controls for identified risks
Audit Data Art 10 (Data Governance) Verify training data quality and fairness
Audit Model Art 15 (Accuracy & Robustness) Verify model behavior post-training
Generate Report Art 11 / Annex IV (Technical Documentation) Produce regulatory documentation

Step 1: Install and Verify

pip install venturalitica

Quick smoke test:

import venturalitica as vl
results = vl.quickstart("loan")
# Expected: 2 PASS, 1 FAIL (age disparity)

Step 2: Write Policy Files

Create two OSCAL policy files in your project directory.

data_policy.oscal.yaml -- Pre-Training Controls

assessment-plan:
  metadata:
    title: Credit Risk Assessment Policy (German Credit)
    version: "1.1"
  control-implementations:
    - description: Credit Scoring Fairness Controls
      implemented-requirements:

        - control-id: credit-data-imbalance
          description: "Minority class >= 20% of dataset"
          props:
            - name: metric_key
              value: class_imbalance
            - name: threshold
              value: "0.2"
            - name: operator
              value: gt
            - name: "input:target"
              value: target

        - control-id: credit-data-bias
          description: "Gender disparate impact follows Four-Fifths Rule"
          props:
            - name: metric_key
              value: disparate_impact
            - name: threshold
              value: "0.8"
            - name: operator
              value: gt
            - name: "input:target"
              value: target
            - name: "input:dimension"
              value: gender

        - control-id: credit-age-disparate
          description: "Age disparate impact ratio > 0.5"
          props:
            - name: metric_key
              value: disparate_impact
            - name: threshold
              value: "0.50"
            - name: operator
              value: gt
            - name: "input:target"
              value: target
            - name: "input:dimension"
              value: age

model_policy.oscal.yaml -- Post-Training Controls

assessment-plan:
  metadata:
    title: "Article 15: Model Accuracy and Fairness"
    version: "1.0"
  control-implementations:
    - description: Model Assurance Controls
      implemented-requirements:

        - control-id: model-accuracy
          description: "Model accuracy >= 70%"
          props:
            - name: metric_key
              value: accuracy_score
            - name: threshold
              value: "0.70"
            - name: operator
              value: gte
            - name: "input:target"
              value: target
            - name: "input:prediction"
              value: prediction

        - control-id: model-gender-fairness
          description: "Demographic parity difference < 0.10"
          props:
            - name: metric_key
              value: demographic_parity_diff
            - name: threshold
              value: "0.10"
            - name: operator
              value: lt
            - name: "input:target"
              value: target
            - name: "input:prediction"
              value: prediction
            - name: "input:dimension"
              value: gender

See Policy Authoring Guide for the full format reference.


Step 3: Audit the Training Data (Article 10)

import venturalitica as vl
from venturalitica.quickstart import load_sample

# Load the German Credit dataset
df = load_sample("loan")

# Audit data quality and fairness BEFORE training
data_results = vl.enforce(
    data=df,
    target="class",
    gender="Attribute9",       # "Personal status and sex"
    age="Attribute13",         # "Age in years"
    policy="data_policy.oscal.yaml"
)

for r in data_results:
    status = "PASS" if r.passed else "FAIL"
    print(f"  {r.control_id:<25} {r.actual_value:.3f}  {r.operator} {r.threshold}  {status}")

Expected output:

  credit-data-imbalance     0.429  gt 0.2   PASS
  credit-data-bias          0.818  gt 0.8   PASS
  credit-age-disparate      0.286  gt 0.5   FAIL

The age disparity fails. In a real project you would address this before training. For this walkthrough we continue to demonstrate the full flow.


Step 4: Train and Audit the Model (Article 15)

from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split

# Prepare features (numeric only for simplicity)
X = df.select_dtypes(include=["number"]).drop(columns=["class"])
y = df["class"]
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Train
model = LogisticRegression(max_iter=1000)
model.fit(X_train, y_train)

# Predict on test set
predictions = model.predict(X_test)
test_df = X_test.copy()
test_df["class"] = y_test
test_df["prediction"] = predictions

# Audit model behavior
model_results = vl.enforce(
    data=test_df,
    target="class",
    prediction="prediction",
    gender="Attribute9",
    policy="model_policy.oscal.yaml"
)

for r in model_results:
    status = "PASS" if r.passed else "FAIL"
    print(f"  {r.control_id:<25} {r.actual_value:.3f}  {r.operator} {r.threshold}  {status}")

Step 5: Wrap with Evidence Collection

In production, wrap the entire pipeline in vl.monitor() to capture evidence automatically:

import venturalitica as vl
from venturalitica.quickstart import load_sample
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split

df = load_sample("loan")

with vl.monitor("loan_full_audit"):
    # --- Article 10: Data Audit ---
    data_results = vl.enforce(
        data=df,
        target="class",
        gender="Attribute9",
        age="Attribute13",
        policy="data_policy.oscal.yaml"
    )

    # --- Train ---
    X = df.select_dtypes(include=["number"]).drop(columns=["class"])
    y = df["class"]
    X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=0.2, random_state=42
    )
    model = LogisticRegression(max_iter=1000)
    model.fit(X_train, y_train)

    # --- Article 15: Model Audit ---
    test_df = X_test.copy()
    test_df["class"] = y_test
    test_df["prediction"] = model.predict(X_test)

    model_results = vl.enforce(
        data=test_df,
        target="class",
        prediction="prediction",
        gender="Attribute9",
        policy="model_policy.oscal.yaml"
    )

# Evidence is now saved in .venturalitica/
# - trace_loan_full_audit.json  (execution trace)
# - results.json                (compliance results)

The monitor() context manager automatically captures:

  • Hardware probe: CPU, RAM, GPU info
  • Carbon probe: Energy consumption estimate
  • BOM probe: Software bill of materials (installed packages)
  • Artifact probe: SHA-256 hashes of data and policy files
  • Trace probe: AST analysis of executed code

See Probes Reference for details on each probe.


Step 6: Visualize and Generate Annex IV

Launch the Dashboard

pip install venturalitica[dashboard]   # Required for the UI
venturalitica ui

The Dashboard presents results across 4 phases:

Phase What You See
1. System Identity Project name, version, AI system classification
2. Risk Policy Your OSCAL controls with pass/fail status
3. Verify & Evaluate Metric values, charts, evidence hashes
4. Technical Report Annex IV document generator

Generate Annex IV

  1. Navigate to Phase 4 in the Dashboard
  2. Select your LLM provider (Mistral API, Ollama, or ALIA)
  3. Click Generate Annex IV
  4. The system reads your trace files and drafts a regulatory document

Output: Annex_IV.md -- a structured document citing your actual metric values as proof of compliance.

Convert to PDF:

pip install mdpdf
mdpdf Annex_IV.md

Files Produced

After running the full lifecycle, your project contains:

my-project/
  data_policy.oscal.yaml          # Step 2: Data governance controls
  model_policy.oscal.yaml         # Step 2: Model assurance controls
  .venturalitica/
    results.json                  # Step 3-4: Compliance results
    trace_loan_full_audit.json    # Step 5: Full execution trace
    latest -> runs/20260218_...   # Symlink to latest run
  Annex_IV.md                    # Step 6: Generated documentation

What's Next