Skip to content

Track and clean up orphaned media files #206

@dahlia

Description

@dahlia

Currently, media files uploaded to disk/S3 are not reliably cleaned up when the associated content is deleted. This leads to orphaned files accumulating in storage over time.

Current state

disk.delete() is called in only two places across the entire codebase:

  • Avatar replacement (graphql/account.ts) — deletes the old file when a new avatar is uploaded
  • OG image replacement (web/og.ts) — same pattern

All other media types are left behind when deleted:

Media type DB tracking File cleanup on delete
Note media (noteMediumTable.key) Yes No — DB row is cascade-deleted, but the file on disk/S3 remains
Video thumbnails (postMediumTable.thumbnailKey) Yes No — same as above
OG images (articleContentTable.ogImageKey, accountTable.ogImageKey) Yes Only on regeneration, not on article/account deletion
Article inline images (uploaded via /api/media) No No — only referenced by URL embedded in Markdown text

The article inline image case is the most severe: there is no database record linking uploaded files to any article or draft, so there is no way to even identify which files are orphaned.

What needs to be done

  • Add a media tracking table (or extend existing tables) to record the accountId, key, creation timestamp, and association to the owning resource (article, draft, note, etc.) for all uploaded media
  • Delete associated media files when a post, article, or draft is deleted
  • Delete associated media files when an inline image URL is removed from content (or handle via periodic cleanup)
  • Consider a periodic garbage collection job to clean up any files that slipped through (e.g. uploads where the article was never saved)

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions