|
| 1 | +# Fine-Tuning Governance Specification |
| 2 | + |
| 3 | +This document captures the governance schema and readiness checks implemented in `frai-core` for the upcoming `frai-finetune` toolkit. The goal is to make fine-tuning workflows auditable, bias-aware, and production-safe by default. |
| 4 | + |
| 5 | +## Schema Overview |
| 6 | + |
| 7 | +The governance plan is represented as a JSON object with the following top-level sections: |
| 8 | + |
| 9 | +| Section | Purpose | Key Fields | |
| 10 | +| --- | --- | --- | |
| 11 | +| `dataset` | Capture provenance, consent, and quality of the fine-tuning corpus. | `name`, `useCases`, `sources[]`, `sensitivity`, `retention`, `qualityChecks` | |
| 12 | +| `training` | Describe the base model, objectives, and safety hooks. | `objective`, `baseModel`, `targetPersona`, `callbacks[]`, `safety.guardrails[]` | |
| 13 | +| `evaluation` | Define regression suites and launch gates. | `datasets[]`, `metrics[]`, `gateCriteria.minimumSuccessRate`, `gateCriteria.riskTolerance` | |
| 14 | +| `approvals` | Track stakeholders and sign-off status. | `stakeholders[].role`, `stakeholders[].name`, `stakeholders[].status` | |
| 15 | +| `monitoring` | Prepare live telemetry and rollback plan. | `owner`, `metrics[]`, `telemetry`, `rollbackPlan.trigger`, `rollbackPlan.steps[]` | |
| 16 | +| `audit` | Archive artefacts for compliance. | `artefacts[]`, `notes` | |
| 17 | + |
| 18 | +### Enumerations |
| 19 | +- `dataset.sensitivity.level`: `none`, `internal`, `confidential`, `restricted` |
| 20 | +- `evaluation.gateCriteria.riskTolerance`: `low`, `medium`, `high`, `critical` |
| 21 | +- `approvals.stakeholders[].status`: `pending`, `approved`, `rejected` |
| 22 | + |
| 23 | +## Validation Rules |
| 24 | + |
| 25 | +`frai-core/src/finetune/index.js` exposes `validateGovernancePlan(plan)` which returns `{ valid, errors }`. Key checks include: |
| 26 | +- Required sections must be present and use the prescribed shape. |
| 27 | +- Dataset sources require `name`, `type`, `access`, `owner`. |
| 28 | +- Evaluation metrics require `name`, `direction`, and numeric `threshold`. |
| 29 | +- Monitoring rollback steps must be non-empty strings. |
| 30 | +- Stakeholder approvals must use an allowed status value. |
| 31 | + |
| 32 | +## Readiness Scoring |
| 33 | + |
| 34 | +`calculateReadiness(plan)` evaluates a plan across five checkpoints: |
| 35 | + |
| 36 | +1. `dataset` – dataset use-cases, sources, and quality checks populated. |
| 37 | +2. `evaluation` – metrics with thresholds and gate criteria. |
| 38 | +3. `approvals` – every stakeholder marked as `approved`. |
| 39 | +4. `monitoring` – named owner plus metrics and rollback steps. |
| 40 | +5. `audit` – artefact references stored. |
| 41 | + |
| 42 | +The function returns `{ status, score, checkpoints }` where: |
| 43 | +- `status` is `ready`, `pending`, or `blocked` (`blocked` if approvals or evaluation are incomplete). |
| 44 | +- `score` is the proportion of satisfied checkpoints. |
| 45 | +- `checkpoints` details completion state per checkpoint. |
| 46 | + |
| 47 | +`summarizeGovernance(plan)` provides a human-readable summary suitable for CLI output or reports. |
| 48 | + |
| 49 | +## Example Configuration |
| 50 | + |
| 51 | +```json |
| 52 | +{ |
| 53 | + "dataset": { |
| 54 | + "name": "Customer Support Conversations", |
| 55 | + "description": "Anonymised support dialogues from Q1 2024.", |
| 56 | + "useCases": ["Assist agents in answering live chats"], |
| 57 | + "sources": [ |
| 58 | + { |
| 59 | + "name": "Zendesk export", |
| 60 | + "type": "csv", |
| 61 | + "access": "s3://frai-data/chat-trimmed.csv", |
| 62 | + "owner": "Support Ops" |
| 63 | + } |
| 64 | + ], |
| 65 | + "sensitivity": { |
| 66 | + "level": "confidential", |
| 67 | + "piiPresent": false, |
| 68 | + "mitigation": "PII scrubbed using internal tool" |
| 69 | + }, |
| 70 | + "retention": { |
| 71 | + "policy": "Delete after 12 months", |
| 72 | + "reviewAt": "2025-06-01" |
| 73 | + }, |
| 74 | + "qualityChecks": { |
| 75 | + "biasAssessment": "No skew observed across demographics", |
| 76 | + "dataBalanceSummary": "Balanced between positive/negative sentiment", |
| 77 | + "manualReview": "Sampled 200 records for offensive content" |
| 78 | + } |
| 79 | + }, |
| 80 | + "training": { |
| 81 | + "objective": "Improve suggestion latency by 20%", |
| 82 | + "baseModel": "gpt-3.5-turbo", |
| 83 | + "targetPersona": "Customer support agent", |
| 84 | + "callbacks": ["biasAuditHook", "guardrailValidationHook"], |
| 85 | + "safety": { |
| 86 | + "guardrails": ["toxicity-filter", "pii-detector"], |
| 87 | + "escalationContacts": ["responsible-ai@frai.dev"] |
| 88 | + } |
| 89 | + }, |
| 90 | + "evaluation": { |
| 91 | + "datasets": [ |
| 92 | + { |
| 93 | + "name": "Zendesk hold-outs", |
| 94 | + "description": "10% holdout set for regression checks" |
| 95 | + } |
| 96 | + ], |
| 97 | + "metrics": [ |
| 98 | + { |
| 99 | + "name": "Exact match", |
| 100 | + "direction": "increase", |
| 101 | + "threshold": 0.66 |
| 102 | + }, |
| 103 | + { |
| 104 | + "name": "Toxicity rate", |
| 105 | + "direction": "decrease", |
| 106 | + "threshold": 0.02 |
| 107 | + } |
| 108 | + ], |
| 109 | + "gateCriteria": { |
| 110 | + "minimumSuccessRate": 0.65, |
| 111 | + "riskTolerance": "medium" |
| 112 | + } |
| 113 | + }, |
| 114 | + "approvals": { |
| 115 | + "stakeholders": [ |
| 116 | + { |
| 117 | + "role": "Responsible AI", |
| 118 | + "name": "Avery Rai", |
| 119 | + "status": "approved" |
| 120 | + }, |
| 121 | + { |
| 122 | + "role": "Security", |
| 123 | + "name": "Chris Sec", |
| 124 | + "status": "approved" |
| 125 | + } |
| 126 | + ] |
| 127 | + }, |
| 128 | + "monitoring": { |
| 129 | + "owner": "oncall-support", |
| 130 | + "metrics": [ |
| 131 | + { |
| 132 | + "name": "Latency p95", |
| 133 | + "alertCondition": "above 1.5s for 10m", |
| 134 | + "owner": "ml-oncall" |
| 135 | + } |
| 136 | + ], |
| 137 | + "telemetry": { |
| 138 | + "storage": "datadog::frai-finetune", |
| 139 | + "retentionDays": 90 |
| 140 | + }, |
| 141 | + "rollbackPlan": { |
| 142 | + "trigger": "Any sev1 incident or approval revocation", |
| 143 | + "steps": [ |
| 144 | + "Disable fine-tuned model", |
| 145 | + "Revert to base model", |
| 146 | + "Notify stakeholders" |
| 147 | + ] |
| 148 | + } |
| 149 | + }, |
| 150 | + "audit": { |
| 151 | + "artefacts": ["s3://frai-artifacts/fine-tune/run-42"], |
| 152 | + "notes": "Review scheduled with governance board on 2024-05-12." |
| 153 | + } |
| 154 | +} |
| 155 | +``` |
| 156 | + |
| 157 | +## Next Steps |
| 158 | +- Expose CLI helpers to initialise a governance template and validate JSON files. |
| 159 | +- Integrate readiness scoring into `frai eval` and future dashboard surfaces. |
| 160 | +- Link artefact storage to `frai-core` document generation for unified audit bundles. |
0 commit comments