feat(crossposting): persist ranker decision log for retro analysis by ohld · Pull Request #211 · ffmemes/ff-backend

ohld · 2026-04-28T20:13:19Z

Summary

Adds crossposting_decision_log table that records, per ranker call, the top-5 candidates with full per-multiplier score breakdown. Closes the forensic gap left by Phase 2 (#210): once the source-quality CTE inputs roll out of the 30-day mature window, we still know why each meme was picked.

What's logged per call

{
  decided_at, channel, picked_meme_id, score_version, median_signal,
  candidate_pool_size,                     -- # memes that passed all filters
  candidates: [                            -- top-5 by ORDER BY
    {
      rank, meme_id, source_id,
      nlikes, ndislikes, raw_impr_rank, age_days, nmemes_sent,
      invited_count, caption_present,
      src_signal, src_quality_mult,        -- per-source rolling metric + clamped multiplier
      lr_factor, impr_factor, age_factor,  -- existing multipliers (Python-mirrored)
      caption_factor, sent_factor, invited_boost,
      final_score
    },
    ...
  ]
}

Why

Phase 2 ranker has 7 multiplier components. After 30+ days, the source-quality CTE inputs roll out and we can't retroactively reconstruct what the ranker saw. Without decision logging, the only retro we can do is "did v2 beat v1?" — not "which multiplier dominated?"

This unlocks queries like:

Distribution of clamp activations: how often does src_quality_mult hit 0.5/2.0?
invited_count value: did the boost change picks vs rank-2?
Pool size: correlated with channel reach? With cron timing?

Implementation

src/database.py: new crossposting_decision_log table (id, decided_at, channel, picked_meme_id FK, score_version, median_signal, candidate_pool_size, candidates JSONB) + composite index on (channel, decided_at).
alembic/versions/2026-04-28_add_crossposting_decision_log_table.py: clean migration (single head verified).
src/crossposting/service.py:
- get_next_meme_for_tgchannelru/en refactored to return (picked_meme, decision_log) tuple. Both None if no candidates pass filters.
- SQL extended to expose raw meme_stats fields + src_signal + median_signal + COUNT(*) OVER () for pool size — ORDER BY identical to before so picking semantics are unchanged.
- Top-5 returned via LIMIT :limit (default 5).
- _compute_score_breakdown(row, channel) mirrors the SQL ORDER BY in Python — the table at the top of the function maps per-channel constants (impr_penalty, age_threshold).
- log_ranker_decision() writes the JSONB row.
src/flows/crossposting/meme.py: handlers unpack the tuple, call log_ranker_decision inside try/except (a log miss is acceptable; a Prefect retry republishing the album is not — same safety pattern as the recent feat(crossposting): source-quality ranker + diversity cap (Phase 2) #210 fix).

Verification

15/15 tests pass (docker compose exec app pytest tests/test_crossposting_meme.py):
- 9 existing _clean_caption tests
- 4 ranker tests (existing, updated for tuple return + assert decision is populated)
- 2 new tests:
  - test_ranker_decision_log_records_top5 — full schema check, top-5 captured, pool size correct
  - test_ranker_decision_log_does_not_propagate_db_errors — invalid args raise (caller wraps in try/except)
ruff check + format clean
alembic single head verified locally

Cost

~10 INSERTs/day (5 RU + 5 EN crons), ~2KB JSON each = 20KB/day = 7MB/year. One additional SQL roundtrip per crosspost (negligible).

Read patterns (post-deploy retro queries)

-- How often does src_quality_mult hit the clamp ceiling/floor?
SELECT channel,
  COUNT(*) FILTER (WHERE (candidates->0->>'src_quality_mult')::float >= 1.99) AS at_ceiling,
  COUNT(*) FILTER (WHERE (candidates->0->>'src_quality_mult')::float <= 0.51) AS at_floor,
  COUNT(*) AS total
FROM crossposting_decision_log
WHERE decided_at > NOW() - INTERVAL '14 days'
GROUP BY channel;

-- Did invited_count boost change picks?
SELECT channel,
  AVG((candidates->0->>'invited_count')::int) AS picked_invites,
  AVG((candidates->1->>'invited_count')::int) AS rank2_invites
FROM crossposting_decision_log
WHERE decided_at > NOW() - INTERVAL '14 days'
GROUP BY channel;

Test plan

All 15 unit + integration tests pass
alembic upgrade head succeeds; downgrade succeeds (cleanup test fixture)
Migration single head check
Ranker still picks the same meme as before (ORDER BY unchanged)
Post-merge: verify first decision log row lands cleanly on next RU/EN cron
Post-merge: Sentry + prod logs clean for 30min after first cron

Predecessor

#210 — Phase 2 source-quality ranker (merged 2026-04-28)

Adds a new table crossposting_decision_log that records, for every ranker call, the top-5 candidates with full per-multiplier score breakdown (lr/impr/age/caption/sent/src_quality/invited_boost/final), the median source-signal at decision time, and the total candidate pool size. This unlocks forensic algo retros after the source-quality CTE inputs roll out of the 30-day mature window: - distribution of clamp activations (was source_quality_mult binding?) - invited_count contribution vs other multipliers - pool size correlated with channel reach Refactors get_next_meme_for_tgchannelru/en to return (picked_meme, decision_log) tuples; flow handlers log the decision inside a try/except that mirrors the post-send safety pattern (a log-miss is acceptable, a Prefect retry republishing the album is not). Cost: ~10 INSERTs/day, ~2KB JSON each, ~7MB/year.

ohld · 2026-04-28T20:22:19Z

STAFF ENGINEER REVIEW: APPROVED — PR #211 (feat/ranker-decision-log) reviewed against production.

Structural pass (Claude /review): clean.

SQL: _RU_QUERY/_EN_QUERY use text() with parameterized :limit, channel names hardcoded — no injection surface.
Conditional side effects: log_ranker_decision wrapped in try/except inside post_meme_to_tgchannelru/en. Logging failure does NOT propagate — Prefect won't republish the meme on a logger failure. Correct trust boundary.
Decimal/float coercion: float(src_signal) / float(median_signal) handles asyncpg Decimal; the if median_signal: zero-falsy check mirrors SQL's NULLIF(..., 0). Correct.
Migration: down_revision a1b4c7d0e3f6 (editorial_posts) → revision 78054f923898. Single linear head, downgrade drops table+index cleanly. picked_meme_id FK is ondelete=SET NULL, nullable. Safe.
Import cleanup: fetch_one import removed, no orphan callers remain (vkgroupru is pass).

Adversarial pass (Codex): 2 P2 findings, no P1.

[P2] Empty ranker runs not logged — when no candidates pass filters, no decision_log row. Can't distinguish 'ranker ran with pool=0' from 'cron didn't run / logging broke'. Worth fixing in a follow-up.
[P2] Decision logged before publish succeeds — if send_new_message_with_meme fails, Prefect retries and crossposting_decision_log accumulates duplicate rows for the same slot. Retro queries that assume 1 row = 1 published meme will skew on transient failures.

Lower-priority observations (Claude):

Drift risk: _compute_score_breakdown reproduces SQL ORDER BY math by hand. No test asserts the Python score matches SQL ranking for a known input. If anyone edits one without the other, telemetry diverges silently. Comment-mandated sync only.
Test name test_ranker_decision_log_does_not_propagate_db_errors reads inverted from what it asserts — the test verifies the function DOES raise.
score_version=2 hardcoded with no comment explaining the bump from 1.

None of these block the merge. Decision-log is observability — degraded analytic signal, not broken product. Auto-merge queued; lint passed, test pending.

ohld merged commit fc68abf into production Apr 28, 2026
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(crossposting): persist ranker decision log for retro analysis#211

feat(crossposting): persist ranker decision log for retro analysis#211
ohld merged 1 commit intoproductionfrom
feat/ranker-decision-log

ohld commented Apr 28, 2026

Uh oh!

ohld commented Apr 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ohld commented Apr 28, 2026

Summary

What's logged per call

Why

Implementation

Verification

Cost

Read patterns (post-deploy retro queries)

Test plan

Predecessor

Uh oh!

ohld commented Apr 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant