Skip to content

Latest commit

 

History

History
699 lines (534 loc) · 25.7 KB

File metadata and controls

699 lines (534 loc) · 25.7 KB

CLAUDE.md - indiekit-endpoint-microsub

Package Overview

@rmdes/indiekit-endpoint-microsub is a comprehensive Microsub social reader plugin for Indiekit. It implements the Microsub protocol for subscribing to feeds, organizing them into channels, and reading posts in a unified timeline interface. The plugin provides both a Microsub API endpoint (for compatible clients) and a built-in web-based reader UI.

Package Name: @rmdes/indiekit-endpoint-microsub Version: 1.0.30 Type: ESM module Entry Point: index.js

Core Features

  • Microsub Protocol Implementation: Full Microsub API (channels, timeline, follow/unfollow, mute/block, search, preview)
  • Web Reader UI: Built-in Nunjucks-based reader interface with channel navigation, timeline view, and composition
  • Multi-Format Feed Support: RSS, Atom, JSON Feed, h-feed (microformats), with fallback feed discovery
  • Real-Time Updates: WebSub (PubSubHubbub) support for instant notifications
  • Adaptive Polling: Tiered polling system (2 minutes to 17+ hours) based on feed update frequency
  • Read State Management: Per-user read tracking with automatic cleanup (keeps last 30 read items per channel)
  • Feed Discovery: Automatic discovery of feeds from websites (RSS/Atom link tags, JSON Feed, h-feed)
  • Webmention Receiving: Accepts webmentions for posts in the timeline
  • Media Proxy: Proxies external images through local endpoint for privacy and caching
  • Blogroll Integration: Optionally syncs feed subscriptions with @rmdes/indiekit-endpoint-blogroll
  • Compose UI: Post replies, likes, reposts, and bookmarks via Micropub

Architecture

Data Flow

┌──────────────────────────────────────────────────────────────┐
│                    FEED INGESTION                            │
├──────────────────────────────────────────────────────────────┤
│ Scheduler (60s interval)                                     │
│   ↓                                                           │
│ getFeedsToFetch() → processFeedBatch()                       │
│   ↓                                                           │
│ fetchFeed() → parseFeed() → normalizeItems()                 │
│   ↓                                                           │
│ addItem() → MongoDB (dedup by uid)                           │
└──────────────────────────────────────────────────────────────┘

┌──────────────────────────────────────────────────────────────┐
│                    READER UI                                 │
├──────────────────────────────────────────────────────────────┤
│ /microsub/reader/channels → List channels                    │
│ /microsub/reader/channels/:uid → Channel timeline            │
│ /microsub/reader/channels/:uid/feeds → Manage subscriptions  │
│ /microsub/reader/compose → Post via Micropub                 │
└──────────────────────────────────────────────────────────────┘

┌──────────────────────────────────────────────────────────────┐
│                    MICROSUB API                              │
├──────────────────────────────────────────────────────────────┤
│ GET/POST /microsub?action=channels → Channel list            │
│ GET/POST /microsub?action=timeline → Timeline items          │
│ POST /microsub?action=follow → Subscribe to feed             │
│ POST /microsub?action=unfollow → Unsubscribe                 │
│ POST /microsub?action=mute/block → Filter content            │
└──────────────────────────────────────────────────────────────┘

┌──────────────────────────────────────────────────────────────┐
│                    REAL-TIME UPDATES                         │
├──────────────────────────────────────────────────────────────┤
│ WebSub Hub → POST /microsub/websub/:id → processWebsubUpdate│
│ Webmention → POST /microsub/webmention → addNotification    │
└──────────────────────────────────────────────────────────────┘

MongoDB Collections

microsub_channels

Stores user channels for organizing feeds.

{
  _id: ObjectId,
  uid: "unique-short-id",        // Generated 8-char alphanumeric
  name: "Technology",
  userId: "user-id",              // For multi-user support
  order: 0,                       // Display order
  settings: {
    excludeTypes: ["repost"],     // Filter by post type
    excludeRegex: "/spam|ads/i"   // Filter by regex
  },
  createdAt: "2026-02-13T...",
  updatedAt: "2026-02-13T..."
}

Special Channel: uid: "notifications" (order: -1, always first) receives webmentions and mentions.

Indexes:

  • { uid: 1 } - Unique channel lookup
  • { userId: 1, order: 1 } - Sorted channel list per user

microsub_feeds

Stores feed subscriptions and polling metadata.

{
  _id: ObjectId,
  channelId: ObjectId,            // References microsub_channels
  url: "https://example.com/feed",
  title: "Example Blog",
  photo: "https://example.com/icon.png",
  tier: 1,                        // Polling tier (0-10)
  unmodified: 0,                  // Consecutive unchanged fetches
  nextFetchAt: Date,              // When to poll next (kept as Date for query)
  lastFetchedAt: "2026-02-13T...", // ISO string
  status: "active" | "error",
  lastError: "HTTP 404",
  lastErrorAt: "2026-02-13T...",
  consecutiveErrors: 0,
  itemCount: 42,
  websub: {
    hub: "https://hub.example/",
    topic: "https://example.com/feed",
    secret: "random-secret",
    leaseSeconds: 432000,
    expiresAt: Date
  },
  createdAt: "2026-02-13T...",
  updatedAt: "2026-02-13T..."
}

Polling Tiers:

  • Tier 0: 1 minute
  • Tier 1: 2 minutes
  • Tier 2: 4 minutes
  • Tier 3: 8 minutes
  • ...
  • Tier 10: 1024 minutes (~17 hours)

Tier Adjustment:

  • Content changed: tier - 1 (faster polling)
  • Unchanged 2x: tier + 1 (slower polling)

Indexes:

  • { channelId: 1, url: 1 } - Prevent duplicate subscriptions
  • { nextFetchAt: 1 } - Scheduler query

microsub_items

Stores timeline items (posts/entries).

{
  _id: ObjectId,
  channelId: ObjectId,
  feedId: ObjectId,
  uid: "https://example.com/post/123", // Canonical URL or GUID
  type: "entry" | "event" | "review",
  url: "https://example.com/post/123",
  name: "Post Title",
  content: {
    text: "Plain text...",
    html: "<p>HTML content...</p>"
  },
  summary: "Short description",
  published: Date,                // Kept as Date for sorting
  updated: Date,
  author: {
    name: "Author Name",
    url: "https://author.example/",
    photo: "https://author.example/photo.jpg"
  },
  category: ["tag1", "tag2"],
  photo: ["https://example.com/img.jpg"],
  video: ["https://example.com/vid.mp4"],
  audio: ["https://example.com/aud.mp3"],
  likeOf: ["https://liked-post.example/"],
  repostOf: ["https://repost.example/"],
  bookmarkOf: ["https://bookmark.example/"],
  inReplyTo: ["https://reply-to.example/"],
  source: {                       // Metadata about feed source
    title: "Example Blog",
    url: "https://example.com"
  },
  readBy: ["user-id"],            // Array of user IDs who read this
  createdAt: "2026-02-13T..."
}

Read State: Items are marked read by adding userId to readBy array. Old read items are auto-deleted (keeps last 30 per channel).

Indexes:

  • { channelId: 1, uid: 1 } - Unique (prevents duplicates)
  • { channelId: 1, published: -1 } - Timeline queries
  • { feedId: 1 } - Feed-specific queries
  • { channelId: 1, url: 1 } - URL-based mark_read operations
  • Text index on name, content.text, content.html, summary, author.name

microsub_notifications

Special items collection for notifications channel (webmentions, mentions).

Same schema as microsub_items, stored in the notifications channel.

microsub_muted

Muted URLs (hide posts from specific URLs).

{
  _id: ObjectId,
  userId: "user-id",
  url: "https://muted-site.example/",
  createdAt: "2026-02-13T..."
}

microsub_blocked

Blocked authors (delete all posts from author URL).

{
  _id: ObjectId,
  userId: "user-id",
  authorUrl: "https://blocked-author.example/",
  createdAt: "2026-02-13T..."
}

Key Files and Modules

Core Entry Point

index.js

  • Exports MicrosubEndpoint class
  • Defines routes, navigation items, mount path
  • Initializes MongoDB collections, scheduler, indexes, cleanup
  • Registers public routes (WebSub, webmention, media proxy)

Controllers

lib/controllers/microsub.js

  • Main Microsub API dispatcher
  • Routes GET/POST requests by action parameter
  • Calls specialized controllers (channels, timeline, follow, mute, block, search, preview, events)

lib/controllers/reader.js

  • Web UI controller for reader interface
  • Channel management (list, create, delete, settings)
  • Feed management (add, remove, edit, rediscover, refresh)
  • Timeline rendering (pagination, read/unread filtering)
  • Compose form (reply, like, repost, bookmark via Micropub)
  • Search and discovery UI

lib/controllers/channels.js

  • Microsub API: action=channels
  • List, create, update, delete, reorder channels

lib/controllers/timeline.js

  • Microsub API: action=timeline
  • Get timeline items (paginated)
  • Mark read/unread, remove items

lib/controllers/follow.js

  • Microsub API: action=follow, action=unfollow
  • Subscribe to feeds, unsubscribe
  • Notifies blogroll plugin via blogroll-notify.js

lib/controllers/mute.js / block.js

  • Microsub API: action=mute, action=unmute, action=block, action=unblock
  • Mute URLs, block authors

lib/controllers/search.js

  • Microsub API: action=search
  • Feed discovery from URL

lib/controllers/preview.js

  • Microsub API: action=preview
  • Preview feed before subscribing

lib/controllers/events.js

  • Microsub API: action=events
  • Server-Sent Events (SSE) stream for real-time updates

lib/controllers/opml.js

  • Export subscriptions as OPML

Storage Layer

lib/storage/channels.js

  • createChannel(), getChannels(), getChannel(), updateChannel(), deleteChannel()
  • reorderChannels(), updateChannelSettings()
  • ensureNotificationsChannel() - Auto-creates notifications channel

lib/storage/feeds.js

  • createFeed(), getFeedsForChannel(), getFeedById(), updateFeed(), deleteFeed()
  • getFeedsToFetch() - Returns feeds where nextFetchAt <= now
  • updateFeedAfterFetch() - Adjusts tier based on content changes
  • updateFeedWebsub() - Stores WebSub subscription data
  • updateFeedStatus() - Tracks errors and health
  • getFeedsWithErrors() - Admin diagnostics

lib/storage/items.js

  • addItem() - Inserts item (dedup by channelId + uid)
  • getTimelineItems() - Paginated timeline with before/after cursors
  • getItemById(), getItemsByUids()
  • markItemsRead(), markItemsUnread() - Per-user read state
  • removeItems() - Delete items by ID/UID/URL
  • cleanupAllReadItems() - Startup cleanup, keeps last 30 read per channel
  • createIndexes() - Creates MongoDB indexes

lib/storage/filters.js

  • getMutedUrls(), addMutedUrl(), removeMutedUrl()
  • getBlockedAuthors(), addBlockedAuthor(), removeBlockedAuthor()

lib/storage/read-state.js

  • getReadState(), markRead(), markUnread()
  • Wraps items.js read operations

Feed Processing

lib/feeds/parser.js

  • detectFeedType() - Sniffs RSS/Atom/JSON Feed/h-feed from content
  • parseFeed() - Dispatcher to format-specific parsers

lib/feeds/rss.js

  • parseRss() - Parses RSS 2.0 and RSS 1.0 (RDF) using feedparser

lib/feeds/atom.js

  • parseAtom() - Parses Atom feeds using feedparser

lib/feeds/jsonfeed.js

  • parseJsonFeed() - Parses JSON Feed 1.x

lib/feeds/hfeed.js

  • parseHfeed() - Parses h-feed microformats using microformats-parser

lib/feeds/normalizer.js

  • normalizeItem() - Converts parsed items to jf2 format

lib/feeds/fetcher.js

  • fetchFeed() - HTTP fetch with User-Agent, timeout, redirect handling

lib/feeds/discovery.js

  • discoverFeeds() - Parses HTML <link> tags for RSS/Atom/JSON Feed
  • discoverAndValidateFeeds() - Discovery + validation
  • getBestFeed() - Prefers Atom > RSS > JSON Feed > h-feed

lib/feeds/validator.js

  • validateFeedUrl() - Fetches and parses feed to ensure it's valid
  • Detects comments feeds (WordPress/Mastodon post replies)

Polling System

lib/polling/scheduler.js

  • startScheduler() - Runs every 60 seconds, calls runSchedulerCycle()
  • stopScheduler() - Cleanup on shutdown
  • refreshFeedNow() - Manual feed refresh

lib/polling/processor.js

  • processFeed() - Fetch, parse, add items for one feed
  • processFeedBatch() - Concurrent processing (default 5 feeds at once)

lib/polling/tier.js

  • getTierInterval() - Maps tier (0-10) to polling interval
  • adjustTier() - Increases/decreases tier based on update frequency

Real-Time Updates

lib/websub/discovery.js

  • discoverWebsubHub() - Parses feed for <link rel="hub"> or <atom:link rel="hub">

lib/websub/subscriber.js

  • subscribeToHub() - Sends WebSub subscribe request to hub

lib/websub/handler.js

  • verify() - Handles hub verification (GET /microsub/websub/:id)
  • receive() - Handles content distribution (POST /microsub/websub/:id)

lib/webmention/receiver.js

  • receive() - Accepts webmentions (POST /microsub/webmention)
  • Adds to notifications channel

lib/webmention/verifier.js

  • verifyWebmention() - Fetches source URL and confirms link to target

lib/webmention/processor.js

  • processWebmention() - Parses source as h-entry, adds to notifications

Media and Utilities

lib/media/proxy.js

  • handleMediaProxy() - GET /microsub/media/:hash
  • Fetches and caches external images, serves with correct Content-Type
  • Hash is base64url(url)

lib/utils/auth.js

  • getUserId() - Extracts user ID from session (defaults to "default" for single-user)

lib/utils/jf2.js

  • generateChannelUid() - Random 8-char alphanumeric
  • convertToJf2() - Transforms various formats to jf2

lib/utils/pagination.js

  • buildPaginationQuery() - Cursor-based pagination (before/after)
  • generatePagingCursors() - Returns before and after cursor strings

lib/utils/validation.js

  • validateChannelName(), validateAction(), validateExcludeTypes(), validateExcludeRegex()

lib/utils/blogroll-notify.js

  • notifyBlogroll() - Fire-and-forget notification to @rmdes/indiekit-endpoint-blogroll
  • On follow: upserts blog entry with source: "microsub"
  • On unfollow: soft-deletes blog entry

lib/cache/redis.js

  • Optional Redis caching (not currently used in core)

lib/search/indexer.js / query.js

  • Full-text search on items (uses MongoDB text index)

lib/realtime/broker.js

  • SSE (Server-Sent Events) broker for real-time notifications

Configuration

import MicrosubEndpoint from "@rmdes/indiekit-endpoint-microsub";

export default {
  plugins: [
    new MicrosubEndpoint({
      mountPath: "/microsub", // Default
    }),
  ],
};

Routes

Protected (require auth)

Method Path Description
GET/POST /microsub Microsub API endpoint (action parameter)
GET /microsub/reader Reader UI (redirects to channels)
GET /microsub/reader/channels List channels
GET /microsub/reader/channels/new New channel form
POST /microsub/reader/channels/new Create channel
GET /microsub/reader/channels/:uid Channel timeline
GET /microsub/reader/channels/:uid/settings Channel settings form
POST /microsub/reader/channels/:uid/settings Update settings
POST /microsub/reader/channels/:uid/delete Delete channel
GET /microsub/reader/channels/:uid/feeds List feeds in channel
POST /microsub/reader/channels/:uid/feeds Add feed to channel
POST /microsub/reader/channels/:uid/feeds/remove Remove feed
GET /microsub/reader/channels/:uid/feeds/:feedId/edit Edit feed form
POST /microsub/reader/channels/:uid/feeds/:feedId/edit Update feed URL
POST /microsub/reader/channels/:uid/feeds/:feedId/rediscover Run feed discovery
POST /microsub/reader/channels/:uid/feeds/:feedId/refresh Force refresh
GET /microsub/reader/item/:id Single item view
GET /microsub/reader/compose Compose form
POST /microsub/reader/compose Submit post via Micropub
GET /microsub/reader/search Search/discover feeds page
POST /microsub/reader/search Search feeds
POST /microsub/reader/subscribe Subscribe from search results
POST /microsub/reader/api/mark-read Mark all items read
GET /microsub/reader/opml Export OPML

Public (no auth)

Method Path Description
GET /microsub/websub/:id WebSub verification
POST /microsub/websub/:id WebSub content distribution
POST /microsub/webmention Webmention receiver
GET /microsub/media/:hash Media proxy

Integration with Other Plugins

Blogroll Plugin

When subscribing/unsubscribing to feeds, Microsub optionally notifies @rmdes/indiekit-endpoint-blogroll:

// On follow
notifyBlogroll(application, "follow", {
  url: feedUrl,
  title: feedTitle,
  channelName: channel.name,
  feedId: feed._id,
  channelId: channel._id,
});

// On unfollow
notifyBlogroll(application, "unfollow", { url: feedUrl });

Blogroll stores feeds with source: "microsub" and soft-deletes on unfollow. If user explicitly deletes from blogroll, Microsub won't re-add.

Micropub Plugin

Compose form posts via Micropub:

// Fetch syndication targets from Micropub config
const micropubUrl = `${application.micropubEndpoint}?q=config`;
const config = await fetch(micropubUrl, {
  headers: { Authorization: `Bearer ${token}` }
});
const syndicationTargets = config["syndicate-to"];

Posts replies, likes, reposts, bookmarks:

micropubData.append("h", "entry");
micropubData.append("in-reply-to", replyToUrl);
micropubData.append("content", content);

Security Hardening (v1.0.30)

The following security fixes were applied in version 1.0.30 (commit 3c8a4b2):

SSRF Protection in Media Proxy

File: lib/media/proxy.js

The media proxy (/microsub/media/:hash) previously accepted any URL, including internal network addresses. An attacker could craft a proxy URL targeting http://localhost, http://127.0.0.1, Docker internal IPs, or cloud metadata endpoints.

Fix: Added isPrivateUrl() blocklist that rejects URLs targeting:

  • localhost, 127.x.x.x, ::1 (loopback)
  • 10.x.x.x, 172.16-31.x.x, 192.168.x.x (RFC 1918 private ranges)
  • 169.254.x.x (link-local/cloud metadata)

Also changed the error fallback from response.redirect(url) (open redirect) to response.status(404).send("Image not available").

ReDoS Prevention in Search

File: lib/storage/items.js

The searchItems() function built a regex from user input without escaping special characters. A crafted search query could cause catastrophic backtracking.

Fix: User input is escaped with replaceAll(/[$()*+.?[\\\]^{|}]/g, "\\$&") before building the regex.

XSS Prevention in Webmention Content

File: lib/webmention/verifier.js

Webmention content.html was stored as-is from external sources. Malicious HTML could be stored and rendered to users.

Fix: Added sanitize-html with an allowlist of safe tags (a, p, br, em, strong, blockquote, ul, ol, li, code, pre) and safe attributes (href on a tags only). All other HTML is stripped before storage.

Open Redirect Removal

File: lib/media/proxy.js

When the media proxy failed to fetch an image, it redirected the user to the original external URL. An attacker could use this as an open redirect.

Fix: Returns 404 "Image not available" instead of redirecting.

Known Gotchas

Date Handling

Rule: Always store dates as ISO strings (new Date().toISOString()), EXCEPT published and updated in microsub_items, and nextFetchAt in microsub_feeds, which are kept as Date objects for MongoDB query compatibility.

// CORRECT - stored as Date for query
{ published: new Date(timestamp) }

// CORRECT - converted to ISO string when sending to client
published: item.published?.toISOString()

// CORRECT - other timestamps as ISO strings
{ createdAt: new Date().toISOString() }

Templates use | date("PPp") filter which requires ISO strings, so transformToJf2() converts published Date to ISO before sending to templates.

Read State Cleanup

Only the last 30 read items per channel are kept. Cleanup runs:

  • On startup: cleanupAllReadItems()
  • After marking items read: cleanupOldReadItems()

This prevents database bloat. Unread items are never deleted by cleanup.

Feed Discovery Gotchas

  • ActivityPub JSON: If a URL returns ActivityPub JSON (e.g., Mastodon profile), discovery throws an error suggesting the direct feed URL (e.g., /feed/)
  • Comments Feeds: WordPress post comment feeds are detected and allowed but warned about (usually not what users want)
  • HTML Feeds: h-feed discovery requires microformats2 markup

Polling and WebSub

  • Feeds with WebSub subscriptions are still polled (but less frequently)
  • WebSub expires after leaseSeconds - plugin should re-subscribe (TODO: check if implemented)
  • Tier adjustment only happens on successful fetch - errors don't change tier

Media Proxy

Images are proxied through /microsub/media/:hash where hash is base64url(imageUrl). This:

  • Hides user IP from origin servers
  • Caches images locally
  • Works around CORS and mixed-content issues

Blogroll Integration

If a feed was explicitly deleted from blogroll (status: "deleted"), Microsub won't re-add it on follow. Delete and re-subscribe to override.

Concurrent Processing

Scheduler processes 5 feeds concurrently by default. Increase BATCH_CONCURRENCY in scheduler.js for faster syncing (but watch memory/network usage).

Dependencies

Core:

  • express - Routing
  • feedparser - RSS/Atom parsing
  • microformats-parser - h-feed parsing
  • htmlparser2 - HTML parsing
  • sanitize-html - XSS prevention
  • luxon - Date handling

Indiekit:

  • @indiekit/error - Error handling
  • @indiekit/frontend - UI components
  • @indiekit/util - Utilities (formatDate, etc.)

Optional:

  • ioredis - Redis caching (not currently used)
  • debug - Debug logging

Testing and Debugging

Enable debug logging:

DEBUG=microsub:* npm start

Check scheduler status: Scheduler runs every 60 seconds. Check logs for [Microsub] Processing N feeds due for refresh.

Inspect feed errors:

const feeds = await getFeedsWithErrors(application, 3);
console.log(feeds.map(f => ({ url: f.url, error: f.lastError })));

Manual feed refresh:

POST /microsub/reader/channels/:uid/feeds/:feedId/refresh

Clear read items:

await cleanupAllReadItems(application);

Check WebSub subscriptions:

const feeds = await collection.find({ "websub.hub": { $exists: true } }).toArray();

Common Issues

Q: Feeds not updating?

  • Check nextFetchAt in microsub_feeds - may be in far future due to high tier
  • Force refresh or rediscover feed from UI

Q: Items disappearing after marking read?

  • Normal behavior - only last 30 read items kept per channel
  • Adjust MAX_READ_ITEMS in storage/items.js if needed

Q: "Unable to detect feed type" error?

  • Feed may be behind login wall
  • Check if URL returns HTML instead of XML/JSON
  • Try feed discovery from homepage URL

Q: Duplicate items showing up?

  • Dedup is by channelId + uid - ensure feed provides stable GUIDs
  • Check if feed URL changed (different feedId → new items)

Q: WebSub not working?

  • Check hub discovery in feed XML: <link rel="hub" href="..."/>
  • Verify callback URL is publicly accessible
  • Check logs for hub verification failures

Future Improvements

  • WebSub lease renewal (currently expires after leaseSeconds)
  • Redis caching for items (reduce MongoDB load)
  • Full-text search UI (backend already implemented)
  • SSE events stream UI (backend already implemented)
  • OPML import (export already works)
  • Microsub client compatibility testing (Indigenous, Monocle, etc.)
  • Feed health dashboard (show error counts, last fetch times)
  • Batch mark-read from timeline UI (currently channel-wide only)