Scoring Engine Architecture

Issue: #189 — Canonical Scoring Engine
Status: Active
Last updated: 2026-03-15

1. Overview

The Canonical Scoring Engine is a versioned, auditable, multi-country scoring layer that centralises all health score computation. Every product's Unhealthiness Score (1–100) is now:

Versioned — tracked against a model version registry
Auditable — every score change logged with before/after values
Parameterised — factor weights and ceilings are JSONB config, not hard-coded
Multi-country ready — country-specific overrides without code duplication
Experimentable — shadow scoring for A/B testing new models

2. Architecture Diagram

┌──────────────────────────────────────────────────────────────────┐
│                     SCORING ENGINE v2                             │
│                                                                   │
│  ┌────────────────┐  ┌────────────────┐  ┌─────────────────┐    │
│  │ Version         │  │ Country        │  │ Entry Points    │    │
│  │ Registry        │  │ Profiles       │  │                 │    │
│  │                 │  │                │  │ compute_score() │    │
│  │ v3.3 (active)   │  │ PL (baseline)  │  │ score_category()│    │
│  │ v4.0 (shadow)   │  │ DE (overrides) │  │ rescore_batch() │    │
│  │ v3.2 (retired)  │  │ CZ (overrides) │  │                 │    │
│  └───────┬────────┘  └───────┬────────┘  └────────┬────────┘    │
│          └──────────┬────────┘                     │             │
│                     ▼                              │             │
│  ┌──────────────────────────────────────────────────────────┐   │
│  │  v3.3 fast path: compute_unhealthiness_v33()             │   │
│  │  vN+ config path: _compute_from_config(product, config)  │   │
│  └──────────────────────────┬───────────────────────────────┘   │
│                     ┌───────┼───────────────┐                    │
│                     ▼       ▼               ▼                    │
│  ┌──────────────┐  ┌──────────────┐  ┌───────────────┐         │
│  │ AUDIT LOG    │  │ SHADOW MODE  │  │ DISTRIBUTION   │         │
│  │ score_audit_ │  │ score_shadow │  │ _snapshots     │         │
│  │ log          │  │ _results     │  │                │         │
│  └──────────────┘  └──────────────┘  └───────────────┘         │
└──────────────────────────────────────────────────────────────────┘

3. Database Objects

Tables

Table	Purpose	RLS
`scoring_model_versions`	Version registry with JSONB config	service_role write, public read
`score_audit_log`	Immutable trail of every score change	service_role insert, authenticated read
`score_shadow_results`	Shadow scoring for A/B experiments	service_role all, authenticated read
`score_distribution_snapshots`	Daily distribution for drift detection	service_role all, authenticated read

Columns Added to `products`

Column	Type	Purpose
`score_model_version`	`text DEFAULT 'v3.3'`	Which model produced the score
`scored_at`	`timestamptz DEFAULT now()`	When the score was last computed

Functions

Function	Purpose	Access
`compute_score(product_id, version?, country?, mode?)`	Canonical single-product scorer	authenticated + service_role
`rescore_batch(version?, country?, category?, mode?, batch_size?)`	Batch re-scoring	service_role
`validate_country_profile(version, country)`	Weight/ceiling validation	authenticated + service_role
`capture_score_distribution()`	Snapshot current distributions	service_role
`detect_score_drift(threshold_pct?)`	Compare today vs yesterday	service_role
`_compute_from_config(product_id, config)`	Config-driven scoring (future)	service_role
`_explain_from_config(product_id, config)`	Config-driven breakdown (future)	service_role

Admin RPCs

RPC	Purpose
`admin_scoring_versions()`	List all versions with product counts
`admin_activate_scoring_version(version)`	Promote version to active (retires previous)
`admin_rescore_batch(version?, country?, category?, mode?, batch_size?)`	Safe rescore wrapper
`admin_score_drift_report(threshold_pct?)`	Drift analysis results

Public RPCs

RPC	Purpose
`api_score_history(product_id, limit?)`	Score change audit trail for a product

4. Scoring Model v3.3 Config

The active model's JSONB config mirrors compute_unhealthiness_v33():

Penalty Factors (9 factors — weights sum to 1.00)

#	Factor	Weight	Ceiling	Unit
1	saturated_fat	0.17	10.0	g/100g
2	sugars	0.17	27.0	g/100g
3	salt	0.17	3.0	g/100g
4	calories	0.10	600	kcal/100g
5	trans_fat	0.11	2.0	g/100g
6	additives	0.07	10	count
7	prep_method	0.08	—	categorical
8	controversies	0.08	—	categorical
9	ingredient_concern	0.05	100	score

Bonus Factor (v3.3 — subtracted from penalty sum)

#	Factor	Weight	Ceiling	Unit	Reference
10	nutrient_density	0.08	100 (combined)	protein_g + fibre_g	EFSA 2017 protein DRV, WHO 2015 fibre, see `SCORING_METHODOLOGY.md §2.8`

Penalty weights sum to 1.00. Bonus subtracted after penalty sum. Clamped to [1, 100]. Maximum bonus: 8 points.

5. Entry Points

compute_score(product_id, version, country, mode)

Modes:

'apply' — computes score, persists to products table, triggers audit
'dry_run' — computes score, returns result without persisting
'shadow' — computes score, writes to score_shadow_results only

Returns: JSONB with product_id, score, previous_score, version, country, mode, breakdown, changed.

v3.3 fast path: For v3.3, delegates directly to compute_unhealthiness_v33() — guaranteeing bit-perfect backward compatibility.

score_category(category, data_completeness, country)

Pipeline entry point. Called by every pipeline's 04_scoring.sql. Now also sets score_model_version and scored_at during batch scoring. The audit trigger captures all changes.

rescore_batch(version, country, category, mode, batch_size)

Batch re-scoring. Iterates products matching filters, calls compute_score() per product. Returns summary with total_processed, scores_changed.

6. Audit Trail

Every UPDATE on products.unhealthiness_score fires the score_change_audit trigger, which inserts into score_audit_log:

Column	Source
`product_id`	`NEW.product_id`
`field_name`	`'unhealthiness_score'`
`old_value`	`OLD.unhealthiness_score`
`new_value`	`NEW.unhealthiness_score`
`model_version`	`NEW.score_model_version`
`country`	`NEW.country`
`trigger_type`	`current_setting('app.score_trigger')` — set by the caller

Trigger types:

'score_category' — pipeline batch scoring
'compute_score' — single-product ad-hoc scoring
'pipeline' — fallback if no context set

7. Version Lifecycle

draft ───► active ───► retired
  │                      ▲
  └── shadow ────────────┘

draft: Under development, cannot score production products
active: Exactly one at any time (enforced by EXCLUDE constraint)
shadow: Running in parallel for A/B comparison
retired: Read-only archive

Use admin_activate_scoring_version('v4.0') to promote — it automatically retires the current active version.

8. Multi-Country Support

Country overrides are stored as JSONB on scoring_model_versions.country_overrides:

{
  "DE": {
    "factor_overrides": [
      {"name": "sugars", "weight": 0.18, "ceiling": 22.0},
      {"name": "salt", "weight": 0.18, "ceiling": 2.0}
    ]
  }
}

When compute_score(123, NULL, 'DE') is called, the DE overrides merge over the base config.

Validate profiles with:

SELECT validate_country_profile('v3.3', 'DE');
-- Returns: { valid: true, total_weight: 1.0, factor_count: 9, ... }

9. Drift Detection

Capture snapshots (run daily):
```
SELECT capture_score_distribution();
```

Detect drift (compare today vs yesterday):

SELECT * FROM detect_score_drift(10.0);
-- Returns: country, category, metric, prev_val, curr_val, drift_pct

Admin report:
```
SELECT admin_score_drift_report(10.0);
```

10. Frontend Integration

The score explanation API (api_score_explanation) now includes:

Field	Type	Example
`model_version`	`string \| null`	`"v3.2"`
`scored_at`	`string \| null`	`"2026-02-25T14:30:00Z"`

The ScoreBreakdownPanel displays a model version badge and freshness timestamp when available.

10.1 TryVit Score (Consumer Display Layer)

The frontend presents the TryVit Score — an inverted view of the internal unhealthiness score:

TryVit Score = 100 − unhealthiness_score

This is a display-only transformation. The scoring engine, database columns, audit trail, drift detection, and all regression anchors continue to operate on the unhealthiness scale (1–100, lower = better). The TryVit Score (1–100, higher = healthier) is computed at render time by toTryVitScore() in the frontend.

Key invariant: A scoring engine change (new weights, new factor, new version) affects unhealthiness_score directly. The TryVit Score changes are an automatic consequence of the inversion — no separate calibration or version tracking is needed for the display layer.

See SCORING_METHODOLOGY.md §2.8 for the full consumer band table and scientific rationale.

11. Distribution Monitoring

Materialized view: mv_scoring_distribution — pre-aggregated scoring band distribution per country and category.

Column	Type	Description
`country`	text	Country code (PL, DE)
`category`	text	Product category
`band`	text	Green / Yellow / Orange / Red / Dark Red
`product_count`	bigint	Products in this band
`pct_of_category`	numeric	Percentage of category in this band
`avg_score`	numeric	Mean unhealthiness score
`min_score`	integer	Lowest score in band
`max_score`	integer	Highest score in band
`stddev_score`	numeric	Score standard deviation

Refresh: Included in refresh_all_materialized_views(). Checked by mv_staleness_check().

QA suite: QA__scoring_distribution.sql — 12 non-blocking checks monitoring band concentration, empty bands, cross-country divergence, category dominance, and score spread. Thresholds calibrated for ~2,600 products; will tighten at 10K scale.

12. Performance

Operation	Target	Method
`score_category()` batch	<5s per category	Batch UPDATE (unchanged)
`compute_score()` single	<10ms	Direct function call
`rescore_batch()` 1000 products	<30s	Loop via compute_score
Audit trigger overhead	<1ms/row	Single INSERT, no cascading

The v3.3 fast path ensures zero performance regression for pipeline scoring.

13. Security

compute_score(), admin RPCs: SECURITY DEFINER SET search_path = public
Audit log: INSERT-only for service_role, read-only for authenticated
Version management: service_role only (admin RPCs)
Shadow results: service_role write, authenticated read

14. QA Tests

17 QA SQL tests in db/qa/QA__scoring_engine.sql:

Test	Validates
T01	v3.3 is active
T02	Single active version constraint
T03	v3.3 config: 10 factors (9 penalty + 1 bonus), weights sum to 1.0
T04	`score_model_version` populated for all scored products
T05	`scored_at` populated for all scored products
T06	Audit log table exists
T07–T08	Core functions exist
T09	Country profile validation works
T10–T11	Shadow results and snapshots tables exist
T12	`api_score_explanation` includes model_version and scored_at
T13	`admin_scoring_versions` returns array
T14	`api_score_history` returns correct shape
T15	`detect_score_drift` is callable
T16	Audit trigger is installed
T17	Grants are correct

Regression Anchors

All regression anchor products in copilot-instructions.md §8.19 use unhealthiness scores (1–100, lower = better). These anchors validate the scoring formula and must remain stable (±2 tolerance) across scoring engine changes.

Consumer display tests (TryVit Score bands, percentile badges) are separate from formula regression tests. Frontend tests in frontend/src/lib/score-utils.test.ts validate the band mapping and inversion logic independently.

15. Unified Formula Registry (#198)

Both scoring and search ranking formulas are governed under a single formula registry system — the v_formula_registry view unifies scoring_model_versions and search_ranking_config into one read-only surface.

Registry View — `v_formula_registry`

Column	Type	Source (scoring)	Source (search)
`domain`	text	`'scoring'`	`'search'`
`version`	text	`smv.version`	`src.version`
`formula_name`	text	`'compute_unhealthiness'`	`src.config_name`
`status`	text	`smv.status`	`active`/`inactive`
`weights_config`	jsonb	`smv.config`	`src.weights`
`fingerprint`	text	SHA-256 of config	SHA-256 of weights
`change_reason`	text	`smv.description`	`src.change_reason`
`is_active`	boolean	`status = 'active'`	`src.active`

Query:

SELECT * FROM v_formula_registry WHERE is_active;

Fingerprint-Based Drift Detection

Every formula's JSONB config is SHA-256 hashed on INSERT/UPDATE via auto-fingerprint triggers. Two drift detection functions exist:

Function	Checks	Use Case
`check_formula_drift()`	JSONB fingerprint vs recomputed hash	Config-level drift (someone edited weights without trigger)
`check_function_source_drift()`	pg_proc source hash vs registered hash	Code-level drift (someone edited function body without updating registry)

Run both as CI/QA checks:

-- Should return 0 rows with status = 'drift_detected'
SELECT * FROM check_formula_drift() WHERE status != 'match';
SELECT * FROM check_function_source_drift() WHERE status != 'match';

Registered Function Source Hashes

The formula_source_hashes table stores expected pg_proc.prosrc SHA-256 hashes for:

compute_unhealthiness_v33 — core scoring function
explain_score_v33 — score breakdown
compute_score — canonical entry point
_compute_from_config — config-driven engine
_explain_from_config — config-driven breakdown
search_rank — search ranking function

After any intentional function modification, update the hash:

UPDATE formula_source_hashes
SET    expected_hash = encode(sha256(p.prosrc::bytea), 'hex'),
       updated_at    = now()
FROM   pg_proc p
JOIN   pg_namespace n ON n.oid = p.pronamespace
WHERE  n.nspname = 'public'
  AND  p.proname = formula_source_hashes.function_name
  AND  formula_source_hashes.function_name = 'compute_unhealthiness_v32';

16. Weight Change Governance

Changing scoring or search weights is a controlled operation with mandatory documentation. This process applies to any modification of factor weights, ceilings, categorical maps, or ranking signals.

When to Use This Process

Changing any weight in compute_unhealthiness_v33() or scoring_model_versions.config
Changing any weight in search_ranking_config.weights
Adding or removing a scoring factor
Modifying a ceiling value or categorical map entry
Adding a new search ranking signal

7-Step Weight Change Protocol

Step	Action	Tool
1. Propose	Create GitHub issue with new weights, scientific rationale, and expected impact	GitHub Issue
2. Simulate	Run `compute_score(product_id, 'vX.Y', NULL, 'dry_run')` on 100 anchor products	SQL
3. Compare	Generate distribution comparison between current and proposed weights	`detect_score_drift()`
4. Review	Weight change must be recorded (even if self-approved — the audit trail matters)	GitHub PR review
5. Register	Insert new version in `scoring_model_versions` / `search_ranking_config`	Migration SQL
6. Activate	Set new version as active: `admin_activate_scoring_version('vX.Y')`	Admin RPC
7. Monitor	Run `check_formula_drift()` + watch score distribution for 48h	QA + Dashboard

Weight Change Checklist (for PR description)

## Weight Change Checklist
- [ ] Scientific rationale documented (EFSA/WHO citation)
- [ ] New version registered in formula_registry
- [ ] 100-product dry-run comparison completed
- [ ] Mean score shift within +/-5 points (or justified if larger)
- [ ] Anchor product scores verified (+/-2 tolerance)
- [ ] check_formula_drift() returns 'match' for all formulas
- [ ] check_function_source_drift() returns 'match' for all functions
- [ ] formula_source_hashes updated if function body changed
- [ ] SCORING_METHODOLOGY.md updated with new weights
- [ ] CHANGELOG.md updated

Current Active Formulas

Domain	Formula	Version	Weights Sum	Status
Scoring	`compute_unhealthiness`	v3.3	1.00 (9 penalty) + 0.08 (bonus)	Active
Search	`default`	v1.0.0	1.00 (5 signals)	Active

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scoring Engine Architecture

1. Overview

2. Architecture Diagram

3. Database Objects

Tables

Columns Added to `products`

Functions

Admin RPCs

Public RPCs

4. Scoring Model v3.3 Config

Penalty Factors (9 factors — weights sum to 1.00)

Bonus Factor (v3.3 — subtracted from penalty sum)

5. Entry Points

compute_score(product_id, version, country, mode)

score_category(category, data_completeness, country)

rescore_batch(version, country, category, mode, batch_size)

6. Audit Trail

7. Version Lifecycle

8. Multi-Country Support

9. Drift Detection

10. Frontend Integration

10.1 TryVit Score (Consumer Display Layer)

11. Distribution Monitoring

12. Performance

13. Security

14. QA Tests

Regression Anchors

15. Unified Formula Registry (#198)

Registry View — `v_formula_registry`

Fingerprint-Based Drift Detection

Registered Function Source Hashes

16. Weight Change Governance

When to Use This Process

7-Step Weight Change Protocol

Weight Change Checklist (for PR description)

Current Active Formulas

FilesExpand file tree

SCORING_ENGINE.md

Latest commit

History

SCORING_ENGINE.md

File metadata and controls

Scoring Engine Architecture

1. Overview

2. Architecture Diagram

3. Database Objects

Tables

Columns Added to products

Functions

Admin RPCs

Public RPCs

4. Scoring Model v3.3 Config

Penalty Factors (9 factors — weights sum to 1.00)

Bonus Factor (v3.3 — subtracted from penalty sum)

5. Entry Points

compute_score(product_id, version, country, mode)

score_category(category, data_completeness, country)

rescore_batch(version, country, category, mode, batch_size)

6. Audit Trail

7. Version Lifecycle

8. Multi-Country Support

9. Drift Detection

10. Frontend Integration

10.1 TryVit Score (Consumer Display Layer)

11. Distribution Monitoring

12. Performance

13. Security

14. QA Tests

Regression Anchors

15. Unified Formula Registry (#198)

Registry View — v_formula_registry

Fingerprint-Based Drift Detection

Registered Function Source Hashes

16. Weight Change Governance

When to Use This Process

7-Step Weight Change Protocol

Weight Change Checklist (for PR description)

Current Active Formulas

Columns Added to `products`

Registry View — `v_formula_registry`