Skip to content

feat(crossposting): persist ranker decision log for retro analysis#211

Merged
ohld merged 1 commit intoproductionfrom
feat/ranker-decision-log
Apr 28, 2026
Merged

feat(crossposting): persist ranker decision log for retro analysis#211
ohld merged 1 commit intoproductionfrom
feat/ranker-decision-log

Conversation

@ohld
Copy link
Copy Markdown
Member

@ohld ohld commented Apr 28, 2026

Summary

Adds crossposting_decision_log table that records, per ranker call, the top-5 candidates with full per-multiplier score breakdown. Closes the forensic gap left by Phase 2 (#210): once the source-quality CTE inputs roll out of the 30-day mature window, we still know why each meme was picked.

What's logged per call

{
  decided_at, channel, picked_meme_id, score_version, median_signal,
  candidate_pool_size,                     -- # memes that passed all filters
  candidates: [                            -- top-5 by ORDER BY
    {
      rank, meme_id, source_id,
      nlikes, ndislikes, raw_impr_rank, age_days, nmemes_sent,
      invited_count, caption_present,
      src_signal, src_quality_mult,        -- per-source rolling metric + clamped multiplier
      lr_factor, impr_factor, age_factor,  -- existing multipliers (Python-mirrored)
      caption_factor, sent_factor, invited_boost,
      final_score
    },
    ...
  ]
}

Why

Phase 2 ranker has 7 multiplier components. After 30+ days, the source-quality CTE inputs roll out and we can't retroactively reconstruct what the ranker saw. Without decision logging, the only retro we can do is "did v2 beat v1?" — not "which multiplier dominated?"

This unlocks queries like:

  • Distribution of clamp activations: how often does src_quality_mult hit 0.5/2.0?
  • invited_count value: did the boost change picks vs rank-2?
  • Pool size: correlated with channel reach? With cron timing?

Implementation

  • src/database.py: new crossposting_decision_log table (id, decided_at, channel, picked_meme_id FK, score_version, median_signal, candidate_pool_size, candidates JSONB) + composite index on (channel, decided_at).
  • alembic/versions/2026-04-28_add_crossposting_decision_log_table.py: clean migration (single head verified).
  • src/crossposting/service.py:
    • get_next_meme_for_tgchannelru/en refactored to return (picked_meme, decision_log) tuple. Both None if no candidates pass filters.
    • SQL extended to expose raw meme_stats fields + src_signal + median_signal + COUNT(*) OVER () for pool size — ORDER BY identical to before so picking semantics are unchanged.
    • Top-5 returned via LIMIT :limit (default 5).
    • _compute_score_breakdown(row, channel) mirrors the SQL ORDER BY in Python — the table at the top of the function maps per-channel constants (impr_penalty, age_threshold).
    • log_ranker_decision() writes the JSONB row.
  • src/flows/crossposting/meme.py: handlers unpack the tuple, call log_ranker_decision inside try/except (a log miss is acceptable; a Prefect retry republishing the album is not — same safety pattern as the recent feat(crossposting): source-quality ranker + diversity cap (Phase 2) #210 fix).

Verification

  • 15/15 tests pass (docker compose exec app pytest tests/test_crossposting_meme.py):
    • 9 existing _clean_caption tests
    • 4 ranker tests (existing, updated for tuple return + assert decision is populated)
    • 2 new tests:
      • test_ranker_decision_log_records_top5 — full schema check, top-5 captured, pool size correct
      • test_ranker_decision_log_does_not_propagate_db_errors — invalid args raise (caller wraps in try/except)
  • ruff check + format clean
  • alembic single head verified locally

Cost

~10 INSERTs/day (5 RU + 5 EN crons), ~2KB JSON each = 20KB/day = 7MB/year. One additional SQL roundtrip per crosspost (negligible).

Read patterns (post-deploy retro queries)

-- How often does src_quality_mult hit the clamp ceiling/floor?
SELECT channel,
  COUNT(*) FILTER (WHERE (candidates->0->>'src_quality_mult')::float >= 1.99) AS at_ceiling,
  COUNT(*) FILTER (WHERE (candidates->0->>'src_quality_mult')::float <= 0.51) AS at_floor,
  COUNT(*) AS total
FROM crossposting_decision_log
WHERE decided_at > NOW() - INTERVAL '14 days'
GROUP BY channel;

-- Did invited_count boost change picks?
SELECT channel,
  AVG((candidates->0->>'invited_count')::int) AS picked_invites,
  AVG((candidates->1->>'invited_count')::int) AS rank2_invites
FROM crossposting_decision_log
WHERE decided_at > NOW() - INTERVAL '14 days'
GROUP BY channel;

Test plan

  • All 15 unit + integration tests pass
  • alembic upgrade head succeeds; downgrade succeeds (cleanup test fixture)
  • Migration single head check
  • Ranker still picks the same meme as before (ORDER BY unchanged)
  • Post-merge: verify first decision log row lands cleanly on next RU/EN cron
  • Post-merge: Sentry + prod logs clean for 30min after first cron

Predecessor

#210 — Phase 2 source-quality ranker (merged 2026-04-28)

Adds a new table crossposting_decision_log that records, for every ranker
call, the top-5 candidates with full per-multiplier score breakdown
(lr/impr/age/caption/sent/src_quality/invited_boost/final), the median
source-signal at decision time, and the total candidate pool size.

This unlocks forensic algo retros after the source-quality CTE inputs
roll out of the 30-day mature window:
- distribution of clamp activations (was source_quality_mult binding?)
- invited_count contribution vs other multipliers
- pool size correlated with channel reach

Refactors get_next_meme_for_tgchannelru/en to return
(picked_meme, decision_log) tuples; flow handlers log the decision
inside a try/except that mirrors the post-send safety pattern (a
log-miss is acceptable, a Prefect retry republishing the album is not).

Cost: ~10 INSERTs/day, ~2KB JSON each, ~7MB/year.
@ohld
Copy link
Copy Markdown
Member Author

ohld commented Apr 28, 2026

STAFF ENGINEER REVIEW: APPROVED — PR #211 (feat/ranker-decision-log) reviewed against production.

Structural pass (Claude /review): clean.

  • SQL: _RU_QUERY/_EN_QUERY use text() with parameterized :limit, channel names hardcoded — no injection surface.
  • Conditional side effects: log_ranker_decision wrapped in try/except inside post_meme_to_tgchannelru/en. Logging failure does NOT propagate — Prefect won't republish the meme on a logger failure. Correct trust boundary.
  • Decimal/float coercion: float(src_signal) / float(median_signal) handles asyncpg Decimal; the if median_signal: zero-falsy check mirrors SQL's NULLIF(..., 0). Correct.
  • Migration: down_revision a1b4c7d0e3f6 (editorial_posts) → revision 78054f923898. Single linear head, downgrade drops table+index cleanly. picked_meme_id FK is ondelete=SET NULL, nullable. Safe.
  • Import cleanup: fetch_one import removed, no orphan callers remain (vkgroupru is pass).

Adversarial pass (Codex): 2 P2 findings, no P1.

  • [P2] Empty ranker runs not logged — when no candidates pass filters, no decision_log row. Can't distinguish 'ranker ran with pool=0' from 'cron didn't run / logging broke'. Worth fixing in a follow-up.
  • [P2] Decision logged before publish succeeds — if send_new_message_with_meme fails, Prefect retries and crossposting_decision_log accumulates duplicate rows for the same slot. Retro queries that assume 1 row = 1 published meme will skew on transient failures.

Lower-priority observations (Claude):

  • Drift risk: _compute_score_breakdown reproduces SQL ORDER BY math by hand. No test asserts the Python score matches SQL ranking for a known input. If anyone edits one without the other, telemetry diverges silently. Comment-mandated sync only.
  • Test name test_ranker_decision_log_does_not_propagate_db_errors reads inverted from what it asserts — the test verifies the function DOES raise.
  • score_version=2 hardcoded with no comment explaining the bump from 1.

None of these block the merge. Decision-log is observability — degraded analytic signal, not broken product. Auto-merge queued; lint passed, test pending.

@ohld ohld merged commit fc68abf into production Apr 28, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant