Issue: #189 — Canonical Scoring Engine
Status: Active
Last updated: 2026-03-15
The Canonical Scoring Engine is a versioned, auditable, multi-country scoring layer that centralises all health score computation. Every product's Unhealthiness Score (1–100) is now:
- Versioned — tracked against a model version registry
- Auditable — every score change logged with before/after values
- Parameterised — factor weights and ceilings are JSONB config, not hard-coded
- Multi-country ready — country-specific overrides without code duplication
- Experimentable — shadow scoring for A/B testing new models
┌──────────────────────────────────────────────────────────────────┐
│ SCORING ENGINE v2 │
│ │
│ ┌────────────────┐ ┌────────────────┐ ┌─────────────────┐ │
│ │ Version │ │ Country │ │ Entry Points │ │
│ │ Registry │ │ Profiles │ │ │ │
│ │ │ │ │ │ compute_score() │ │
│ │ v3.3 (active) │ │ PL (baseline) │ │ score_category()│ │
│ │ v4.0 (shadow) │ │ DE (overrides) │ │ rescore_batch() │ │
│ │ v3.2 (retired) │ │ CZ (overrides) │ │ │ │
│ └───────┬────────┘ └───────┬────────┘ └────────┬────────┘ │
│ └──────────┬────────┘ │ │
│ ▼ │ │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ v3.3 fast path: compute_unhealthiness_v33() │ │
│ │ vN+ config path: _compute_from_config(product, config) │ │
│ └──────────────────────────┬───────────────────────────────┘ │
│ ┌───────┼───────────────┐ │
│ ▼ ▼ ▼ │
│ ┌──────────────┐ ┌──────────────┐ ┌───────────────┐ │
│ │ AUDIT LOG │ │ SHADOW MODE │ │ DISTRIBUTION │ │
│ │ score_audit_ │ │ score_shadow │ │ _snapshots │ │
│ │ log │ │ _results │ │ │ │
│ └──────────────┘ └──────────────┘ └───────────────┘ │
└──────────────────────────────────────────────────────────────────┘
| Table | Purpose | RLS |
|---|---|---|
scoring_model_versions |
Version registry with JSONB config | service_role write, public read |
score_audit_log |
Immutable trail of every score change | service_role insert, authenticated read |
score_shadow_results |
Shadow scoring for A/B experiments | service_role all, authenticated read |
score_distribution_snapshots |
Daily distribution for drift detection | service_role all, authenticated read |
| Column | Type | Purpose |
|---|---|---|
score_model_version |
text DEFAULT 'v3.3' |
Which model produced the score |
scored_at |
timestamptz DEFAULT now() |
When the score was last computed |
| Function | Purpose | Access |
|---|---|---|
compute_score(product_id, version?, country?, mode?) |
Canonical single-product scorer | authenticated + service_role |
rescore_batch(version?, country?, category?, mode?, batch_size?) |
Batch re-scoring | service_role |
validate_country_profile(version, country) |
Weight/ceiling validation | authenticated + service_role |
capture_score_distribution() |
Snapshot current distributions | service_role |
detect_score_drift(threshold_pct?) |
Compare today vs yesterday | service_role |
_compute_from_config(product_id, config) |
Config-driven scoring (future) | service_role |
_explain_from_config(product_id, config) |
Config-driven breakdown (future) | service_role |
| RPC | Purpose |
|---|---|
admin_scoring_versions() |
List all versions with product counts |
admin_activate_scoring_version(version) |
Promote version to active (retires previous) |
admin_rescore_batch(version?, country?, category?, mode?, batch_size?) |
Safe rescore wrapper |
admin_score_drift_report(threshold_pct?) |
Drift analysis results |
| RPC | Purpose |
|---|---|
api_score_history(product_id, limit?) |
Score change audit trail for a product |
The active model's JSONB config mirrors compute_unhealthiness_v33():
| # | Factor | Weight | Ceiling | Unit |
|---|---|---|---|---|
| 1 | saturated_fat | 0.17 | 10.0 | g/100g |
| 2 | sugars | 0.17 | 27.0 | g/100g |
| 3 | salt | 0.17 | 3.0 | g/100g |
| 4 | calories | 0.10 | 600 | kcal/100g |
| 5 | trans_fat | 0.11 | 2.0 | g/100g |
| 6 | additives | 0.07 | 10 | count |
| 7 | prep_method | 0.08 | — | categorical |
| 8 | controversies | 0.08 | — | categorical |
| 9 | ingredient_concern | 0.05 | 100 | score |
| # | Factor | Weight | Ceiling | Unit | Reference |
|---|---|---|---|---|---|
| 10 | nutrient_density | 0.08 | 100 (combined) | protein_g + fibre_g | EFSA 2017 protein DRV, WHO 2015 fibre, see SCORING_METHODOLOGY.md §2.8 |
Penalty weights sum to 1.00. Bonus subtracted after penalty sum. Clamped to [1, 100]. Maximum bonus: 8 points.
Modes:
'apply'— computes score, persists to products table, triggers audit'dry_run'— computes score, returns result without persisting'shadow'— computes score, writes toscore_shadow_resultsonly
Returns: JSONB with product_id, score, previous_score, version, country, mode, breakdown, changed.
v3.3 fast path: For v3.3, delegates directly to compute_unhealthiness_v33() — guaranteeing bit-perfect backward compatibility.
Pipeline entry point. Called by every pipeline's 04_scoring.sql. Now also sets score_model_version and scored_at during batch scoring. The audit trigger captures all changes.
Batch re-scoring. Iterates products matching filters, calls compute_score() per product. Returns summary with total_processed, scores_changed.
Every UPDATE on products.unhealthiness_score fires the score_change_audit trigger, which inserts into score_audit_log:
| Column | Source |
|---|---|
product_id |
NEW.product_id |
field_name |
'unhealthiness_score' |
old_value |
OLD.unhealthiness_score |
new_value |
NEW.unhealthiness_score |
model_version |
NEW.score_model_version |
country |
NEW.country |
trigger_type |
current_setting('app.score_trigger') — set by the caller |
Trigger types:
'score_category'— pipeline batch scoring'compute_score'— single-product ad-hoc scoring'pipeline'— fallback if no context set
draft ───► active ───► retired
│ ▲
└── shadow ────────────┘
- draft: Under development, cannot score production products
- active: Exactly one at any time (enforced by EXCLUDE constraint)
- shadow: Running in parallel for A/B comparison
- retired: Read-only archive
Use admin_activate_scoring_version('v4.0') to promote — it automatically retires the current active version.
Country overrides are stored as JSONB on scoring_model_versions.country_overrides:
{
"DE": {
"factor_overrides": [
{"name": "sugars", "weight": 0.18, "ceiling": 22.0},
{"name": "salt", "weight": 0.18, "ceiling": 2.0}
]
}
}When compute_score(123, NULL, 'DE') is called, the DE overrides merge over the base config.
Validate profiles with:
SELECT validate_country_profile('v3.3', 'DE');
-- Returns: { valid: true, total_weight: 1.0, factor_count: 9, ... }-
Capture snapshots (run daily):
SELECT capture_score_distribution(); -
Detect drift (compare today vs yesterday):
SELECT * FROM detect_score_drift(10.0); -- Returns: country, category, metric, prev_val, curr_val, drift_pct
-
Admin report:
SELECT admin_score_drift_report(10.0);
The score explanation API (api_score_explanation) now includes:
| Field | Type | Example |
|---|---|---|
model_version |
string | null |
"v3.2" |
scored_at |
string | null |
"2026-02-25T14:30:00Z" |
The ScoreBreakdownPanel displays a model version badge and freshness timestamp when available.
The frontend presents the TryVit Score — an inverted view of the internal unhealthiness score:
TryVit Score = 100 − unhealthiness_score
This is a display-only transformation. The scoring engine, database columns, audit trail, drift detection, and all regression anchors continue to operate on the unhealthiness scale (1–100, lower = better). The TryVit Score (1–100, higher = healthier) is computed at render time by toTryVitScore() in the frontend.
Key invariant: A scoring engine change (new weights, new factor, new version) affects unhealthiness_score directly. The TryVit Score changes are an automatic consequence of the inversion — no separate calibration or version tracking is needed for the display layer.
See SCORING_METHODOLOGY.md §2.8 for the full consumer band table and scientific rationale.
Materialized view: mv_scoring_distribution — pre-aggregated scoring band distribution per country and category.
| Column | Type | Description |
|---|---|---|
country |
text | Country code (PL, DE) |
category |
text | Product category |
band |
text | Green / Yellow / Orange / Red / Dark Red |
product_count |
bigint | Products in this band |
pct_of_category |
numeric | Percentage of category in this band |
avg_score |
numeric | Mean unhealthiness score |
min_score |
integer | Lowest score in band |
max_score |
integer | Highest score in band |
stddev_score |
numeric | Score standard deviation |
Refresh: Included in refresh_all_materialized_views(). Checked by mv_staleness_check().
QA suite: QA__scoring_distribution.sql — 12 non-blocking checks monitoring band concentration, empty bands, cross-country divergence, category dominance, and score spread. Thresholds calibrated for ~2,600 products; will tighten at 10K scale.
| Operation | Target | Method |
|---|---|---|
score_category() batch |
<5s per category | Batch UPDATE (unchanged) |
compute_score() single |
<10ms | Direct function call |
rescore_batch() 1000 products |
<30s | Loop via compute_score |
| Audit trigger overhead | <1ms/row | Single INSERT, no cascading |
The v3.3 fast path ensures zero performance regression for pipeline scoring.
compute_score(), admin RPCs:SECURITY DEFINER SET search_path = public- Audit log: INSERT-only for service_role, read-only for authenticated
- Version management: service_role only (admin RPCs)
- Shadow results: service_role write, authenticated read
17 QA SQL tests in db/qa/QA__scoring_engine.sql:
| Test | Validates |
|---|---|
| T01 | v3.3 is active |
| T02 | Single active version constraint |
| T03 | v3.3 config: 10 factors (9 penalty + 1 bonus), weights sum to 1.0 |
| T04 | score_model_version populated for all scored products |
| T05 | scored_at populated for all scored products |
| T06 | Audit log table exists |
| T07–T08 | Core functions exist |
| T09 | Country profile validation works |
| T10–T11 | Shadow results and snapshots tables exist |
| T12 | api_score_explanation includes model_version and scored_at |
| T13 | admin_scoring_versions returns array |
| T14 | api_score_history returns correct shape |
| T15 | detect_score_drift is callable |
| T16 | Audit trigger is installed |
| T17 | Grants are correct |
All regression anchor products in copilot-instructions.md §8.19 use unhealthiness scores (1–100, lower = better). These anchors validate the scoring formula and must remain stable (±2 tolerance) across scoring engine changes.
Consumer display tests (TryVit Score bands, percentile badges) are separate from formula regression tests. Frontend tests in frontend/src/lib/score-utils.test.ts validate the band mapping and inversion logic independently.
Both scoring and search ranking formulas are governed under a single formula registry system — the v_formula_registry view unifies scoring_model_versions and search_ranking_config into one read-only surface.
| Column | Type | Source (scoring) | Source (search) |
|---|---|---|---|
domain |
text | 'scoring' |
'search' |
version |
text | smv.version |
src.version |
formula_name |
text | 'compute_unhealthiness' |
src.config_name |
status |
text | smv.status |
active/inactive |
weights_config |
jsonb | smv.config |
src.weights |
fingerprint |
text | SHA-256 of config | SHA-256 of weights |
change_reason |
text | smv.description |
src.change_reason |
is_active |
boolean | status = 'active' |
src.active |
Query:
SELECT * FROM v_formula_registry WHERE is_active;Every formula's JSONB config is SHA-256 hashed on INSERT/UPDATE via auto-fingerprint triggers. Two drift detection functions exist:
| Function | Checks | Use Case |
|---|---|---|
check_formula_drift() |
JSONB fingerprint vs recomputed hash | Config-level drift (someone edited weights without trigger) |
check_function_source_drift() |
pg_proc source hash vs registered hash | Code-level drift (someone edited function body without updating registry) |
Run both as CI/QA checks:
-- Should return 0 rows with status = 'drift_detected'
SELECT * FROM check_formula_drift() WHERE status != 'match';
SELECT * FROM check_function_source_drift() WHERE status != 'match';The formula_source_hashes table stores expected pg_proc.prosrc SHA-256 hashes for:
compute_unhealthiness_v33— core scoring functionexplain_score_v33— score breakdowncompute_score— canonical entry point_compute_from_config— config-driven engine_explain_from_config— config-driven breakdownsearch_rank— search ranking function
After any intentional function modification, update the hash:
UPDATE formula_source_hashes
SET expected_hash = encode(sha256(p.prosrc::bytea), 'hex'),
updated_at = now()
FROM pg_proc p
JOIN pg_namespace n ON n.oid = p.pronamespace
WHERE n.nspname = 'public'
AND p.proname = formula_source_hashes.function_name
AND formula_source_hashes.function_name = 'compute_unhealthiness_v32';Changing scoring or search weights is a controlled operation with mandatory documentation. This process applies to any modification of factor weights, ceilings, categorical maps, or ranking signals.
- Changing any weight in
compute_unhealthiness_v33()orscoring_model_versions.config - Changing any weight in
search_ranking_config.weights - Adding or removing a scoring factor
- Modifying a ceiling value or categorical map entry
- Adding a new search ranking signal
| Step | Action | Tool |
|---|---|---|
| 1. Propose | Create GitHub issue with new weights, scientific rationale, and expected impact | GitHub Issue |
| 2. Simulate | Run compute_score(product_id, 'vX.Y', NULL, 'dry_run') on 100 anchor products |
SQL |
| 3. Compare | Generate distribution comparison between current and proposed weights | detect_score_drift() |
| 4. Review | Weight change must be recorded (even if self-approved — the audit trail matters) | GitHub PR review |
| 5. Register | Insert new version in scoring_model_versions / search_ranking_config |
Migration SQL |
| 6. Activate | Set new version as active: admin_activate_scoring_version('vX.Y') |
Admin RPC |
| 7. Monitor | Run check_formula_drift() + watch score distribution for 48h |
QA + Dashboard |
## Weight Change Checklist
- [ ] Scientific rationale documented (EFSA/WHO citation)
- [ ] New version registered in formula_registry
- [ ] 100-product dry-run comparison completed
- [ ] Mean score shift within +/-5 points (or justified if larger)
- [ ] Anchor product scores verified (+/-2 tolerance)
- [ ] check_formula_drift() returns 'match' for all formulas
- [ ] check_function_source_drift() returns 'match' for all functions
- [ ] formula_source_hashes updated if function body changed
- [ ] SCORING_METHODOLOGY.md updated with new weights
- [ ] CHANGELOG.md updated| Domain | Formula | Version | Weights Sum | Status |
|---|---|---|---|---|
| Scoring | compute_unhealthiness |
v3.3 | 1.00 (9 penalty) + 0.08 (bonus) | Active |
| Search | default |
v1.0.0 | 1.00 (5 signals) | Active |