You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Fixes race condition when multiple preprocessed variants are defined for a Previewable file is attached
This fixes race condition in Active Storage when multiple preprocessed variants are defined for a `Previewable` file is attached.
When a variant is specified for a "previewable" file type (e.g. video or PDF) attachment, a `preview_image` attachment is first created and attached on the original blob and then any user-specified variants are derived from _that_ preview image. When those variants are named and have `preprocessed: true`, the jobs to create those variants are queued simultaneously.
Example from my case:
```ruby
has_one_attached :file, dependent: :purge_later do |attachable|
attachable.variant :preview, resize_to_fill: [400, 400], preprocessed: true
attachable.variant :still, format: "jpg", saver: {quality: 85}, preprocessed: true
end
```
When a `Previewable` attachment is created (a video, in my case), `TransformJob.perform_later` is called for each named variant with `preprocessed: true`. Unless your queue adapter is synchronous (e.g. :inline or :test), this results in a race condition in which every such variant's worker will check `processed?`, see that no `preview_image` attachment exists yet on the `ActiveStorage::Blob`, and:
1. Redundantly download the file from storage
2. Create duplicative ActiveStorage::Attachment and `ActiveStorage::Blob` records for the `preview_image` attachment (all but one of which will be orphaned from the original blob's `has_one_attached :preview_image`)
3. Create variant blobs (and associated `ActiveStorage::VariantRecord`) that are similarly orphaned (by virtue of being a variant of an orphaned `preview_image` blob)
As a result, if the video is ever purged, `PurgeJob` will only find the current `has_one_attached :preview_image` and whatever variant demanded it into existence, then leave the rest as orphaned records in the database and in storage.
Pretty simple: wrap the first step of the job in `blob.with_lock {}`. By pessimistically locking on the blob, we can prevent processing the preview image multiple times by multiple `TransformJob` jobs running concurrently.
Alternate approaches would all be more work:
* Queuing a `PreviewImage` job instead of N `TransformJob` and have it, only after `preview_image` is attached, enqueue those `TransformJob` jobs
* Batching up all the named variant transformations into a single meta-job
Writing a test for this inside Rails would be difficult because it would require running the resulting TransformJob jobs concurrently. I [started a test](https://github.com/searls/rails/blob/fix-video-duplicate-preview-variants/activestorage/test/models/variant_with_record_test.rb#L348-L367) but failed to reproduce, in part because the test queue adapter will perform enqueued jobs inline instead of concurrently. In order to write a test that replicated the issue appropriately, we might first need a new option for `perform_enqueued_jobs(async: true) { … }`
If you're interested, [this gist](https://gist.github.com/searls/5b8298abe88b3206f670ea3c6d574aab) includes a driver script and output before and after the patch showing it working.
I only found this because I'm a total cheapskate and was literally counting records in my S3 bucket to ensure `PurgeJob` worked. Then I wasted the next two days trying to figure out why before landing on this. I strongly suspect that ActiveStorage users who host video and take advantage of `preprocessed: true` named variants will have a lot of orphaned stuff floating around their buckets.
To see if you have any such "zombie" preview_images (and presumably, associated variants) floating around your application that would survive calls to `purge` on the owning attachment, you could write a query like this:
```
ActiveStorage::Attachment
.joins("INNER JOIN active_storage_attachments as other_attachments ON
active_storage_attachments.record_id = other_attachments.record_id AND
active_storage_attachments.id != other_attachments.id")
.where(
:name => "preview_image",
:record_type => "ActiveStorage::Blob",
"other_attachments.name" => "preview_image",
"other_attachments.record_type" => "ActiveStorage::Blob"
)
.distinct
```
Clearing out one's production database and backend storage to get this all right-sized should be a fun exercise for the reader.
Co-authored-by: Aaron Patterson <[email protected]>
0 commit comments