Skip to content

Conversation

@tomerqodo
Copy link

@tomerqodo tomerqodo commented Dec 4, 2025

User description

Benchmark PR discourse#36020

Type: Clean (correct implementation)

Original PR Title: FEATURE: Add automation to remove uploads from deleted posts
Original PR Description: This PR introduces a new automation script that removes uploads attached to deleted posts.

Using this automation will add a revision to deleted posts by the System User, removing the upload or attachment references. This Automation goes a 1000 by every run and does not re-run on already ran posts

When the automation cleans the posts, uploads will no longer have reference to the post and on the next time the clean up uploads job runs it will permanently removes the uploads.

Example of automation run:

 Hey it is a regular post with a link to [Discourse](https://www.discourse.org) 
 and an image: ![logo.png|100x200](upload://gj6GJHlc1Sa5YXuGz549oXBbcFv.png)
 and a file: [small.pdf|attachment](upload://3bWzVVoRhUXxi7tiPenInoebHyX.pdf) (130 Bytes)

When the automation has run in this post, this will be the revisioned version:

 Hey it is a regular post with a link to [Discourse](https://www.discourse.org) 
 and an image:
 and a file: 

Original PR URL: discourse#36020


PR Type

Enhancement


Description

  • Adds new automation script to remove upload markup from deleted posts

  • Implements batch processing with 1000 post limit per run

  • Integrates with clean_up_uploads job for permanent upload deletion

  • Includes comprehensive test coverage for recurring and point-in-time triggers


Diagram Walkthrough

flowchart LR
  A["Deleted Posts with Uploads"] -->|Automation Script| B["Remove Upload Markup"]
  B -->|Revision by System User| C["Posts without Upload References"]
  C -->|Next Run| D["Clean Up Uploads Job"]
  D -->|Permanent Deletion| E["Uploads Removed from System"]
Loading

File Walkthrough

Relevant files
Configuration changes
scripts.rb
Register new upload cleanup automation script                       

plugins/automation/lib/discourse_automation/scripts.rb

  • Adds new constant REMOVE_UPLOAD_MARKUP_FROM_DELETED_POSTS to scripts
    registry
  • Registers the new automation script identifier in alphabetical order
+1/-0     
plugin.rb
Configure batch size and register script file                       

plugins/automation/plugin.rb

  • Adds REMOVE_UPLOAD_MARKUP_FROM_DELETED_POSTS_BATCH_SIZE constant set
    to 1000
  • Registers the new script file in the plugin's autoload list
+2/-0     
Enhancement
remove_upload_markup_from_deleted_posts.rb
Implement upload markup removal automation script               

plugins/automation/lib/discourse_automation/scripts/remove_upload_markup_from_deleted_posts.rb

  • Implements core automation script that removes upload and attachment
    markup from deleted posts
  • Uses regex pattern to match both image and file attachment markdown
    syntax
  • Processes posts in batches of 1000, skipping already-processed posts
    via custom field
  • Creates revisions with system user and edit reason for audit trail
  • Marks processed posts with uploads_removed_at timestamp to prevent
    re-processing
+49/-0   
Tests
remove_upload_markup_from_deleted_posts_spec.rb
Add comprehensive test coverage for automation                     

plugins/automation/spec/scripts/remove_upload_markup_from_deleted_posts_spec.rb

  • Comprehensive test suite with 174 lines covering both recurring and
    point-in-time triggers
  • Tests verify upload markup removal from deleted posts while preserving
    non-deleted posts
  • Validates custom field timestamp tracking to prevent duplicate
    processing
  • Integration tests confirm compatibility with clean_up_uploads job for
    permanent deletion
  • Tests ensure automation respects batch limits and idempotency
+174/-0 
Documentation
server.en.yml
Add localization strings for automation                                   

plugins/automation/config/locales/server.en.yml

  • Adds localization strings for the new automation script
  • Includes title, description, and edit reason messages
  • Provides user-facing documentation about the automation's purpose and
    caution
+4/-0     

@qodo-code-review
Copy link

PR Compliance Guide 🔍

Below is a summary of compliance checks for this PR:

Security Compliance
🟢
No security concerns identified No security vulnerabilities detected by AI analysis. Human verification advised for critical code.
Ticket Compliance
🎫 No ticket provided
  • Create ticket/issue
Codebase Duplication Compliance
Codebase context is not defined

Follow the guide to enable codebase context checks.

Custom Compliance
🟢
Generic: Meaningful Naming and Self-Documenting Code

Objective: Ensure all identifiers clearly express their purpose and intent, making code
self-documenting

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Secure Error Handling

Objective: To prevent the leakage of sensitive system information through error messages while
providing sufficient detail for internal debugging.

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Secure Logging Practices

Objective: To ensure logs are useful for debugging and auditing without exposing sensitive
information like PII, PHI, or cardholder data.

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Security-First Input Validation and Data Handling

Objective: Ensure all data inputs are validated, sanitized, and handled securely to prevent
vulnerabilities

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Comprehensive Audit Trails

Objective: To create a detailed and reliable record of critical system actions for security analysis
and compliance.

Status:
Limited Auditing: The automation performs critical revisions on posts but does not explicitly write audit
logs beyond the post revision and a custom field timestamp, making it unclear if
sufficient audit context (who, when, what, outcome) is captured centrally.

Referred Code
script do |trigger, fields|
  uploads_removed_at = Time.now
  edit_reason =
    I18n.t("discourse_automation.scriptables.remove_upload_markup_from_deleted_posts.edit_reason")

  # it matches both ![alt|size](upload://key) and [small.pdf|attachment](upload://key.pdf) (Number Bytes)
  upload_and_attachment_regex =
    %r{!?\[([^\]|]+)(?:\|[^\]]*)?\]\(upload://([A-Za-z0-9_-]+)[^)]*\)(?:\s*\([^)]*\))?}

  Post
    .with_deleted
    .where.not(deleted_at: nil)
    .joins(:upload_references)
    .where("upload_references.target_type = 'Post'")
    .joins(
      "LEFT JOIN post_custom_fields ON posts.id = post_custom_fields.post_id AND post_custom_fields.name = 'uploads_removed_at'",
    )
    .where("post_custom_fields.post_id IS NULL")
    .distinct
    .limit(DiscourseAutomation::REMOVE_UPLOAD_MARKUP_FROM_DELETED_POSTS_BATCH_SIZE)
    .each do |post|


 ... (clipped 13 lines)

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Robust Error Handling and Edge Case Management

Objective: Ensure comprehensive error handling that provides meaningful context and graceful
degradation

Status:
Missing Handling: The script lacks explicit error handling for database operations and post revision
failures (e.g., nil raw, revise/save failures), which may lead to silent failures without
actionable context.

Referred Code
.each do |post|
  if updated_raw = post.raw.gsub!(upload_and_attachment_regex, "")
    if ok =
         post.revise(
           Discourse.system_user,
           { raw: updated_raw, edit_reason: edit_reason },
           force_new_version: true,
           skip_validations: true,
         )
      post.custom_fields["uploads_removed_at"] = uploads_removed_at
      post.save_custom_fields
    end

Learn more about managing compliance generic rules or creating your own custom rules

Compliance status legend 🟢 - Fully Compliant
🟡 - Partial Compliant
🔴 - Not Compliant
⚪ - Requires Further Human Verification
🏷️ - Compliance label

@qodo-code-review
Copy link

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Ensure atomic update with transaction

Wrap the post revision and custom field update in a database transaction to
ensure atomicity.

plugins/automation/lib/discourse_automation/scripts/remove_upload_markup_from_deleted_posts.rb [35-46]

 if updated_raw = post.raw.gsub!(upload_and_attachment_regex, "")
-  if ok =
-       post.revise(
+  Post.transaction do
+    if post.revise(
          Discourse.system_user,
          { raw: updated_raw, edit_reason: edit_reason },
          force_new_version: true,
          skip_validations: true,
        )
-    post.custom_fields["uploads_removed_at"] = uploads_removed_at
-    post.save_custom_fields
+      post.custom_fields["uploads_removed_at"] = uploads_removed_at
+      post.save_custom_fields(true)
+    end
   end
 end
  • Apply / Chat
Suggestion importance[1-10]: 7

__

Why: The suggestion correctly identifies a potential atomicity issue and proposes wrapping the post revision and custom field update in a transaction, which is a valid improvement for robustness.

Medium
High-level
The script's query is inefficient

To improve query performance, replace the LEFT JOIN with a WHERE ... IS NULL
check with a NOT EXISTS subquery. This change will more efficiently filter out
posts that have already been processed.

Examples:

plugins/automation/lib/discourse_automation/scripts/remove_upload_markup_from_deleted_posts.rb [23-33]
    Post
      .with_deleted
      .where.not(deleted_at: nil)
      .joins(:upload_references)
      .where("upload_references.target_type = 'Post'")
      .joins(
        "LEFT JOIN post_custom_fields ON posts.id = post_custom_fields.post_id AND post_custom_fields.name = 'uploads_removed_at'",
      )
      .where("post_custom_fields.post_id IS NULL")
      .distinct

 ... (clipped 1 lines)

Solution Walkthrough:

Before:

Post
  .with_deleted
  .where.not(deleted_at: nil)
  .joins(:upload_references)
  .joins(
    "LEFT JOIN post_custom_fields ON posts.id = post_custom_fields.post_id AND post_custom_fields.name = 'uploads_removed_at'"
  )
  .where("post_custom_fields.post_id IS NULL")
  .distinct
  .limit(BATCH_SIZE)
  .each do |post|
    # process post
  end

After:

Post
  .with_deleted
  .where.not(deleted_at: nil)
  .joins(:upload_references)
  .where(
    "NOT EXISTS (
      SELECT 1
      FROM post_custom_fields pcf
      WHERE pcf.post_id = posts.id AND pcf.name = 'uploads_removed_at'
    )"
  )
  .distinct
  .limit(BATCH_SIZE)
  .each do |post|
    # process post
  end
Suggestion importance[1-10]: 6

__

Why: The suggestion provides a valid performance optimization for a core database query, which is important for scalability on large sites, by replacing a LEFT JOIN with a more efficient NOT EXISTS subquery.

Low
General
Avoid in-place string modification

Replace gsub! with gsub and an explicit comparison to avoid in-place string
modification and improve code clarity.

plugins/automation/lib/discourse_automation/scripts/remove_upload_markup_from_deleted_posts.rb [35-46]

-if updated_raw = post.raw.gsub!(upload_and_attachment_regex, "")
+updated_raw = post.raw.gsub(upload_and_attachment_regex, "")
+if updated_raw != post.raw
   if ok =
        post.revise(
          Discourse.system_user,
          { raw: updated_raw, edit_reason: edit_reason },
          force_new_version: true,
          skip_validations: true,
        )
     post.custom_fields["uploads_removed_at"] = uploads_removed_at
     post.save_custom_fields
   end
 end
  • Apply / Chat
Suggestion importance[1-10]: 4

__

Why: The suggestion proposes a stylistic change from gsub! to gsub for clarity, which is a valid but minor improvement with no functional impact in this context.

Low
  • More

@github-actions github-actions bot added the i18n label Dec 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants