Skip to content

Add story clustering to group duplicate stories across feeds#2057

Open
samuelclay wants to merge 2 commits intomainfrom
story-clusters
Open

Add story clustering to group duplicate stories across feeds#2057
samuelclay wants to merge 2 commits intomainfrom
story-clusters

Conversation

@samuelclay
Copy link
Owner

@samuelclay samuelclay commented Feb 12, 2026

Summary

  • New apps/clustering module that groups stories with matching or similar titles from different feeds using exact normalized title matching plus fuzzy Jaccard similarity on significant words
  • Clusters stored in Redis (sCL: / zCL: keys, 14-day TTL) and displayed as always-expanded inline source list below the representative story in both river and single-feed views
  • Celery task ComputeStoryClusters triggered after feed updates for feeds with premium subscribers, rate-limited to once per 6h per feed
  • Briefing integration: shared normalize_title(), pre-computed cluster lookups in _find_duplicate_stories(), and cluster annotations in AI summary prompts
Screenshot 2026-02-11 at 9 50 56 PM

Test plan

  • Open All Site Stories in river view, verify cluster sources appear inline below stories that have duplicates across feeds
  • Click into a single feed that has clustered stories, verify clusters also appear there
  • Verify cluster quality: all grouped stories should be about the same event from different feeds
  • Check dark theme styling of cluster source rows
  • Verify non-premium users do not see clusters
  • Confirm Celery task runs after feed updates (docker logs newsblur_celery | grep Clustering)

Generated with Claude Code

samuelclay and others added 2 commits February 11, 2026 21:51
New apps/clustering module that groups stories with matching or similar
titles from different feeds. Uses exact normalized title matching plus
fuzzy Jaccard similarity on significant words. Clusters are stored in
Redis and displayed inline below the representative story in both river
and single-feed views. Triggered via Celery task after feed updates.
Premium-only feature, always on.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant