Skip to content

Add forward spam detection via quote text and external reply photo OCR#115

Merged
Szer merged 2 commits intomainfrom
feature/forward-spam-detection
Feb 27, 2026
Merged

Add forward spam detection via quote text and external reply photo OCR#115
Szer merged 2 commits intomainfrom
feature/forward-spam-detection

Conversation

@Szer
Copy link
Contributor

@Szer Szer commented Feb 27, 2026

Summary

  • Detect spam in forwarded/quoted messages by extracting message.Quote.Text and running OCR on message.ExternalReply.Photo, prepending the content before the original message text so ML sees the full picture
  • Feature-flagged via FORWARD_SPAM_DETECTION_ENABLED env var (default: true)
  • Extracted reusable ocrPhotos helper to deduplicate the photo-to-text pipeline between existing OCR enrichment and the new forward enrichment
  • Both tryEnrichMessageWithOcr and tryEnrichMessageWithForwardedContent now skip messages from non-monitored chats, avoiding unnecessary OCR/network calls
  • ocrPhotos now handles the case where all candidate photos have null FileSize by falling back to largest by dimensions instead of throwing on an empty sequence

Changes

  • Types.fs -- added ForwardSpamDetectionEnabled to BotConfiguration
  • Program.fs -- wired FORWARD_SPAM_DETECTION_ENABLED env var (default true)
  • Bot.fs -- added tryEnrichMessageWithForwardedContent + ocrPhotos/selectLargestPhoto helpers, called in onUpdate before existing OCR enrichment; added monitored-chat guards to both enrichment functions
  • ContainerTestBase.fs -- set env var in ML-enabled (true) and ML-disabled (false) test containers
  • TgMessageUtils.fs -- added Tg.textQuote, Tg.externalReply helpers; added quote/externalReply params to Tg.quickMsg
  • MLBanTests.fs -- 5 new tests: spam/ham in quote text, prepend ordering, spam/ham in external reply photos via OCR

Test plan

  • All 5 new forward spam tests pass
  • Full test suite (59 tests) passes with zero regressions

Spam forwarded as "reply to external message" was bypassing ML because
the bot only analyzed message.Text/Caption and message.Photo. Now we
extract text from message.Quote and run OCR on message.ExternalReply.Photo,
prepending the result before the original message text so the ML pipeline
sees the full content.

Feature-flagged via FORWARD_SPAM_DETECTION_ENABLED (default: true).

Made-with: Cursor
@Szer Szer force-pushed the feature/forward-spam-detection branch from fe89d6b to 43c9d81 Compare February 27, 2026 08:01
@Szer Szer requested a review from Copilot February 27, 2026 08:01
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends the bot’s spam-detection enrichment pipeline to include forwarded/quoted content by prepending message.Quote.Text and OCR text from message.ExternalReply.Photo to the message text, controlled by a new FORWARD_SPAM_DETECTION_ENABLED configuration flag.

Changes:

  • Add ForwardSpamDetectionEnabled to BotConfiguration and wire it to FORWARD_SPAM_DETECTION_ENABLED (default true).
  • Add forwarded-content enrichment (quote text + external reply photo OCR) and refactor shared OCR logic into ocrPhotos.
  • Extend test utilities and add ML tests covering quote and external reply OCR scenarios.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/VahterBanBot/Types.fs Adds ForwardSpamDetectionEnabled to configuration.
src/VahterBanBot/Program.fs Reads FORWARD_SPAM_DETECTION_ENABLED env var (default true).
src/VahterBanBot/Bot.fs Adds forwarded-content enrichment + shared ocrPhotos; wires enrichment into onUpdate.
src/VahterBanBot.Tests/ContainerTestBase.fs Sets forward-spam env var appropriately for ML-enabled/disabled containers.
src/VahterBanBot.Tests/TgMessageUtils.fs Adds helpers for TextQuote / ExternalReplyInfo; extends quickMsg.
src/VahterBanBot.Tests/MLBanTests.fs Adds tests for spam/ham detection via quote text and external reply photo OCR.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…itored chats

- ocrPhotos: handle case where all candidate photos have null FileSize
  by falling back to largest by dimensions instead of throwing on empty
  sequence. Extracted selectLargestPhoto to avoid FS3511 in task CE.
- tryEnrichMessageWithForwardedContent and tryEnrichMessageWithOcr now
  skip processing for messages from chats not in ChatsToMonitor, avoiding
  unnecessary OCR network calls.

Made-with: Cursor
@Szer Szer merged commit 005cc91 into main Feb 27, 2026
2 checks passed
@Szer Szer deleted the feature/forward-spam-detection branch February 27, 2026 08:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants