Skip to content

Conversation

@tomerqodo
Copy link

@tomerqodo tomerqodo commented Dec 4, 2025

User description

Benchmark PR discourse#36020

Type: Corrupted (contains bugs)

Original PR Title: FEATURE: Add automation to remove uploads from deleted posts
Original PR Description: This PR introduces a new automation script that removes uploads attached to deleted posts.

Using this automation will add a revision to deleted posts by the System User, removing the upload or attachment references. This Automation goes a 1000 by every run and does not re-run on already ran posts

When the automation cleans the posts, uploads will no longer have reference to the post and on the next time the clean up uploads job runs it will permanently removes the uploads.

Example of automation run:

 Hey it is a regular post with a link to [Discourse](https://www.discourse.org) 
 and an image: ![logo.png|100x200](upload://gj6GJHlc1Sa5YXuGz549oXBbcFv.png)
 and a file: [small.pdf|attachment](upload://3bWzVVoRhUXxi7tiPenInoebHyX.pdf) (130 Bytes)

When the automation has run in this post, this will be the revisioned version:

 Hey it is a regular post with a link to [Discourse](https://www.discourse.org) 
 and an image:
 and a file: 

Original PR URL: discourse#36020


PR Type

Enhancement


Description

  • Adds new automation script to remove upload markup from deleted posts

  • Implements batch processing with 1000 post limit per execution

  • Integrates with clean_up_uploads job for permanent upload deletion

  • Includes comprehensive test coverage for recurring and point-in-time triggers


Diagram Walkthrough

flowchart LR
  A["Automation Script"] -->|"Processes deleted posts"| B["Remove Upload Markup"]
  B -->|"Adds revision"| C["System User Edit"]
  C -->|"Marks with timestamp"| D["Custom Field"]
  D -->|"Next run"| E["Clean Up Uploads Job"]
  E -->|"Deletes orphaned"| F["Uploads Removed"]
Loading

File Walkthrough

Relevant files
Enhancement
scripts.rb
Register new upload removal automation script                       

plugins/automation/lib/discourse_automation/scripts.rb

  • Adds new constant REMOVE_UPLOAD_MARKUP_FROM_DELETED_POSTS to scripts
    registry
  • Registers the new automation script identifier in alphabetical order
+1/-0     
remove_upload_markup_from_deleted_posts.rb
Implement upload markup removal automation script               

plugins/automation/lib/discourse_automation/scripts/remove_upload_markup_from_deleted_posts.rb

  • Implements core automation logic to remove upload and attachment
    markup from deleted posts
  • Uses regex pattern to match both image and file attachment markdown
    syntax
  • Processes posts in batches of 1000, skipping already-processed posts
    via custom field
  • Creates revisions through System User with edit reason for audit trail
+49/-0   
Configuration changes
plugin.rb
Configure batch size and register script file                       

plugins/automation/plugin.rb

  • Adds REMOVE_UPLOAD_MARKUP_FROM_DELETED_POSTS_BATCH_SIZE constant set
    to 1000
  • Registers the new script file in the plugin's autoload list
+2/-0     
Tests
remove_upload_markup_from_deleted_posts_spec.rb
Add comprehensive test coverage for upload removal automation

plugins/automation/spec/scripts/remove_upload_markup_from_deleted_posts_spec.rb

  • Comprehensive test suite with 174 lines covering both recurring and
    point-in-time triggers
  • Tests verify upload markup removal from deleted posts while preserving
    non-deleted posts
  • Validates custom field timestamp tracking and idempotency (no
    re-processing)
  • Integration test confirms compatibility with clean_up_uploads job for
    permanent deletion
+174/-0 
Documentation
server.en.yml
Add localization strings for upload removal automation     

plugins/automation/config/locales/server.en.yml

  • Adds localization strings for the new automation script
  • Includes title, description, and edit reason messages for user
    interface
  • Includes warning about permanent deletion and caution advisory
+4/-0     

@qodo-code-review
Copy link

PR Compliance Guide 🔍

Below is a summary of compliance checks for this PR:

Security Compliance
🟢
No security concerns identified No security vulnerabilities detected by AI analysis. Human verification advised for critical code.
Ticket Compliance
🎫 No ticket provided
  • Create ticket/issue
Codebase Duplication Compliance
Codebase context is not defined

Follow the guide to enable codebase context checks.

Custom Compliance
🟢
Generic: Meaningful Naming and Self-Documenting Code

Objective: Ensure all identifiers clearly express their purpose and intent, making code
self-documenting

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Secure Error Handling

Objective: To prevent the leakage of sensitive system information through error messages while
providing sufficient detail for internal debugging.

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Secure Logging Practices

Objective: To ensure logs are useful for debugging and auditing without exposing sensitive
information like PII, PHI, or cardholder data.

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Security-First Input Validation and Data Handling

Objective: Ensure all data inputs are validated, sanitized, and handled securely to prevent
vulnerabilities

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Comprehensive Audit Trails

Objective: To create a detailed and reliable record of critical system actions for security analysis
and compliance.

Status:
Missing audit logs: The automation performs critical edits to posts (removing upload references) without
emitting explicit audit log entries containing actor, timestamp, action, and outcome
beyond relying on revisions and a custom field.

Referred Code
updated_raw = post.raw.gsub(upload_and_attachment_regex, "")
if updated_raw
  if ok =
       post.revise(
         Discourse.system_user,
         { raw: updated_raw, edit_reason: edit_reason },
         force_new_version: true,
         skip_validations: true,
       )
    post.custom_fields["uploads_removed_at"] = uploads_removed_at
    post.save_custom_fields
  end

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Robust Error Handling and Edge Case Management

Objective: Ensure comprehensive error handling that provides meaningful context and graceful
degradation

Status:
No error handling: The script iterates and revises posts without rescuing or logging failures (e.g., nil raw,
failed revise/save), which could lead to silent partial processing.

Referred Code
updated_raw = post.raw.gsub(upload_and_attachment_regex, "")
if updated_raw
  if ok =
       post.revise(
         Discourse.system_user,
         { raw: updated_raw, edit_reason: edit_reason },
         force_new_version: true,
         skip_validations: true,
       )
    post.custom_fields["uploads_removed_at"] = uploads_removed_at
    post.save_custom_fields
  end

Learn more about managing compliance generic rules or creating your own custom rules

Compliance status legend 🟢 - Fully Compliant
🟡 - Partial Compliant
🔴 - Not Compliant
⚪ - Requires Further Human Verification
🏷️ - Compliance label

@qodo-code-review
Copy link

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
High-level
The script may undelete posts

The script's use of post.revise on deleted posts can cause them to be restored
if the deleting user no longer exists, as PostRevisor may reset the deleted_at
field. This is a critical side-effect that defeats the purpose of the script.

Examples:

plugins/automation/lib/discourse_automation/scripts/remove_upload_markup_from_deleted_posts.rb [37-42]
               post.revise(
                 Discourse.system_user,
                 { raw: updated_raw, edit_reason: edit_reason },
                 force_new_version: true,
                 skip_validations: true,
               )

Solution Walkthrough:

Before:

# plugins/automation/lib/discourse_automation/scripts/remove_upload_markup_from_deleted_posts.rb

Post
  .with_deleted
  .where.not(deleted_at: nil)
  # ... more query conditions
  .each do |post|
    updated_raw = post.raw.gsub(upload_regex, "")
    # post.revise can undelete the post if the user who deleted it is gone.
    post.revise(
      Discourse.system_user,
      { raw: updated_raw },
      # ... options
    )
    post.custom_fields["uploads_removed_at"] = Time.now
    post.save_custom_fields
  end

After:

# plugins/automation/lib/discourse_automation/scripts/remove_upload_markup_from_deleted_posts.rb

Post
  .with_deleted
  .where.not(deleted_at: nil)
  # ... more query conditions
  .each do |post|
    updated_raw = post.raw.gsub(upload_regex, "")
    
    # A safer update method is needed that doesn't trigger PostRevisor's
    # undeletion logic. This might involve a direct DB update or a
    # modified revision process that preserves the deletion status.
    Post.where(id: post.id).update_all(raw: updated_raw)

    post.custom_fields["uploads_removed_at"] = Time.now
    post.save_custom_fields(true) # Force save after direct DB update
  end
Suggestion importance[1-10]: 10

__

Why: This suggestion correctly identifies a critical bug where the script could unintentionally restore deleted posts, a severe side effect that completely undermines the feature's purpose.

High
Possible issue
Ensure atomic updates within a transaction

Wrap the post revision and custom field update in a Post.transaction block to
ensure atomicity. Also, correct the conditional check to if post.raw !=
updated_raw since gsub never returns nil.

plugins/automation/lib/discourse_automation/scripts/remove_upload_markup_from_deleted_posts.rb [34-46]

 updated_raw = post.raw.gsub(upload_and_attachment_regex, "")
-if updated_raw
-  if ok =
-       post.revise(
+if post.raw != updated_raw
+  Post.transaction do
+    if post.revise(
          Discourse.system_user,
          { raw: updated_raw, edit_reason: edit_reason },
          force_new_version: true,
          skip_validations: true,
        )
-    post.custom_fields["uploads_removed_at"] = uploads_removed_at
-    post.save_custom_fields
+      post.custom_fields["uploads_removed_at"] = uploads_removed_at
+      post.save_custom_fields(true)
+    end
   end
 end
  • Apply / Chat
Suggestion importance[1-10]: 8

__

Why: The suggestion correctly identifies a potential race condition and lack of atomicity between revising a post and saving its custom field, which could lead to reprocessing. It also fixes a minor logical flaw in the conditional check.

Medium
  • More

@github-actions github-actions bot added the i18n label Dec 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants