Skip to content

Conversation

@sylvinus
Copy link
Member

@sylvinus sylvinus commented Jan 16, 2026

This allows to use S3-compatible object storage to offload blobs, making Postgres much lighter. We design for storing ~1B emails on a single instance.

Fixes #185.

Summary by CodeRabbit

  • New Features

    • Tiered blob storage: offload message blobs to object storage with transparent retrieval and optional encryption
    • Configurable blob encryption with key rotation support
    • Hourly scheduled offload of eligible blobs
  • Chores

    • Management command to verify and repair tiered storage integrity
    • Automatic cleanup of object storage when blobs are deleted
    • Test and environment additions to ensure storage buckets exist and validate workflows

✏️ Tip: You can customize this high-level summary in your review settings.

This allows to use S3-compatible object storage to offload blobs,
making Postgres much lighter. We design for storing ~1B emails on
a single instance.
@coderabbitai
Copy link

coderabbitai bot commented Jan 16, 2026

📝 Walkthrough

Walkthrough

Implements tiered blob storage with optional encryption and offload to object storage. Adds Blob storage metadata, a TieredStorageService, Celery offload tasks, verification/rotation management command, settings and env defaults, tests, and Docker Compose bucket initialization.

Changes

Cohort / File(s) Summary
Core model & enums
src/backend/core/enums.py, src/backend/core/models.py, src/backend/core/migrations/0014_blob_encryption_key_id_blob_storage_location_and_more.py
Added BlobStorageLocationChoices, Blob.encryption_key_id, Blob.storage_location; made raw_content nullable; added storage-key utilities, storage-aware get_content/save flows and offload-aware save behavior.
Tiered storage service & tasks
src/backend/core/services/tiered_storage.py, src/backend/core/services/tiered_storage_tasks.py
New TieredStorageService with SHA256 keys, Fernet encryption, deduplication, upload/download, existence/orphan checks. Added Celery tasks to offload blobs and per-blob offload with transactional locking and retry logic.
Management command
src/backend/core/management/commands/verify_tiered_storage.py
New management command to verify db↔storage consistency, detect/delete orphans, verify hashes, and perform re-encryption (with dry-run and limits).
Settings & configuration
src/backend/messages/settings.py, env.d/development/backend.defaults
New storage entry message-blobs, encryption settings (MESSAGES_BLOB_ENCRYPTION_KEYS, MESSAGES_BLOB_ENCRYPTION_ACTIVE_KEY_ID), tiered offload toggles and thresholds, and object-storage credential vars in env defaults.
Signals & indexing toggles
src/backend/core/signals.py, src/backend/core/services/search/search.py, src/backend/core/api/viewsets/config.py
Added post-delete signal to cleanup object storage for object-stored blobs; switched some getattr-based OPENSEARCH flag checks to direct attribute access; config viewset now populates missing keys with None.
Compose & Celery scheduling
compose.yaml, src/backend/messages/celery_app.py
Compose now ensures st-messages/msg-blobs bucket exists. Added Celery beat job offload-blobs-to-object-storage running hourly.
Tests & test infra
src/backend/core/tests/..., src/backend/core/tests/conftest.py
Extensive new tests for tiered storage, tasks, and verify command; autouse fixture to ensure S3 buckets (message-imports, message-blobs) exist; minor test fixes and package init additions.
Utilities
src/backend/core/utils.py
JSONValue.to_python treats empty/whitespace string as None to allow defaults.
Misc small changes
src/backend/core/tests/commands/__init__.py, src/backend/core/tests/services/__init__.py, src/backend/core/tests/tasks/__init__.py, small task test var rename
New test package inits and minor test variable naming adjustments.

Sequence Diagram

sequenceDiagram
    actor App as Application
    participant Blob
    participant Tiered as TieredStorageService
    participant Object as Object Storage
    participant DB as PostgreSQL

    rect rgba(76, 175, 80, 0.5)
    Note over App,DB: Blob creation (encrypt & store)
    App->>Blob: create_blob(compressed_content)
    Blob->>Tiered: encrypt(data)
    Tiered-->>Blob: encrypted_bytes, key_id
    Blob->>DB: save(encrypted_bytes, key_id, storage=POSTGRES)
    DB-->>Blob: blob_id
    end

    rect rgba(33, 150, 243, 0.5)
    Note over App,Object: Offload to object storage
    App->>Tiered: upload_blob(blob)
    Tiered->>Object: PUT /blobs/... (encrypted)
    Object-->>Tiered: success
    Tiered->>DB: update(blob.storage_location=OBJECT_STORAGE, raw_content=NULL)
    DB-->>Tiered: ✓
    end

    rect rgba(244, 67, 54, 0.5)
    Note over App,DB: Retrieval (transparent)
    App->>Blob: get_content()
    alt storage == POSTGRES
        Blob->>Tiered: decrypt(raw_content, key_id)
        Tiered-->>Blob: decrypted_bytes
    else storage == OBJECT_STORAGE
        Blob->>Object: GET /blobs/...
        Object-->>Blob: encrypted_bytes
        Blob->>Tiered: decrypt(encrypted_bytes, key_id)
        Tiered-->>Blob: decrypted_bytes
    end
    Blob->>Blob: decompress(...)
    Blob-->>App: original content
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Suggested reviewers

  • sdemagny

Poem

🐰 I dug a burrow for each byte,
Encrypted, tidied, tucked up tight,
Off they hop to blob-filled skies,
Keys and queues and tests—hooray!
A little rabbit clap of bytes.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed Title clearly indicates the main feature: implementing tiered storage, which aligns with the primary objective of the changeset.
Linked Issues check ✅ Passed The PR implements all core coding requirements from issue #185: enables Blob instances in separate model [#185], supports S3-compatible object storage [#185], and provides offloading mechanism.
Out of Scope Changes check ✅ Passed All changes are scoped to tiered storage implementation. Minor ancillary changes (search setting refactoring, JSONValue whitespace handling) are necessary supporting adjustments directly tied to core functionality.
Docstring Coverage ✅ Passed Docstring coverage is 96.23% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@src/backend/core/management/commands/verify_tiered_storage.py`:
- Around line 466-495: The current flow writes the newly encrypted object via
self.service.storage.save(storage_key, ...) before updating
blob.encryption_key_id inside transaction.atomic(), risking storage/DB
inconsistency if the DB update fails; instead, write the new encrypted bytes to
a temporary object (e.g. derive a temp key from storage_key and new_key_id)
using self.service.storage.save(temp_key, ContentFile(encrypted)), then perform
the DB update inside transaction.atomic() (update blob.encryption_key_id and
save), and only after the transaction succeeds atomically remove/rename the temp
object to the final storage_key (or copy temp→final and delete temp) so storage
and DB remain consistent; reference symbols: self.service.storage.save,
storage_key, temp_key (create), self.service.encrypt, blob.encryption_key_id,
transaction.atomic.

In `@src/backend/core/services/tiered_storage.py`:
- Around line 31-43: In __init__, the enabled gate currently checks for an
OPTIONS.endpoint_url which wrongly disables valid S3 setups; instead set
self.enabled based on presence of the "message-blobs" storage config itself
(e.g. check that settings.STORAGES contains a non-empty "message-blobs" entry).
Update the assignment to self.enabled to use
settings.STORAGES.get("message-blobs") (or "message-blobs" in settings.STORAGES
and truthy) rather than digging for OPTIONS.endpoint_url so AWS S3 configs
without endpoint_url remain enabled.
🧹 Nitpick comments (3)
src/backend/core/services/tiered_storage_tasks.py (1)

68-133: Consider adding retry for transient failures.

The task handles lock contention gracefully by returning "locked" status, but transient failures (network issues, temporary S3 unavailability) at line 131 are logged and returned as errors without retry. The periodic offload_blobs_task will eventually re-queue these blobs, but adding explicit retry behavior for transient exceptions (e.g., ConnectionError, Timeout) could improve reliability.

💡 Optional: Add retry for transient failures
-@celery_app.task(bind=True)
+@celery_app.task(bind=True, autoretry_for=(ConnectionError, TimeoutError), retry_backoff=True, max_retries=3)
 def offload_single_blob_task(self, blob_id: str) -> Dict[str, Any]:
src/backend/core/models.py (1)

1536-1557: Enforce storage_location/raw_content invariants at the DB layer.
With raw_content now nullable, inconsistent states (e.g., OBJECT_STORAGE + non-null content)
become possible and will surface as runtime errors in get_content. A check constraint makes
the invariant explicit and avoids silent drift. This will require a migration.

♻️ Proposed constraint
         constraints = [
             models.CheckConstraint(
                 check=(
                     models.Q(mailbox__isnull=False) | models.Q(maildomain__isnull=False)
                 ),
                 name="blob_has_owner",
             ),
+            models.CheckConstraint(
+                check=(
+                    models.Q(
+                        storage_location=BlobStorageLocationChoices.POSTGRES,
+                        raw_content__isnull=False,
+                    )
+                    | models.Q(
+                        storage_location=BlobStorageLocationChoices.OBJECT_STORAGE,
+                        raw_content__isnull=True,
+                    )
+                ),
+                name="blob_storage_location_matches_content",
+            ),
         ]

As per coding guidelines, enforce data integrity with model constraints.

Also applies to: 1583-1589

src/backend/core/services/tiered_storage.py (1)

244-281: Guard against orphan-delete races and capture delete errors.
There’s a TOCTOU window between the reference count (Line 259-263) and deletion (Line 274-275);
a concurrent offload could add a reference after the count and still have its object deleted.
Consider an advisory lock keyed by SHA256 or a transactional guard around the check+delete.

Also, capture the storage deletion exception to Sentry so cleanup failures are observable.

♻️ Suggested Sentry capture
 from cryptography.fernet import Fernet
+from sentry_sdk import capture_exception
@@
-        except Exception as e:  # pylint: disable=broad-except
-            logger.warning("Failed to delete blob from storage %s: %s", key, e)
+        except Exception as exc:  # pylint: disable=broad-except
+            capture_exception(exc)
+            logger.warning("Failed to delete blob from storage %s: %s", key, exc)
             return False

As per coding guidelines, capture and report exceptions to Sentry.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ec4c7ef and 8858822.

📒 Files selected for processing (22)
  • compose.yaml
  • env.d/development/backend.defaults
  • src/backend/core/api/viewsets/config.py
  • src/backend/core/enums.py
  • src/backend/core/management/commands/verify_tiered_storage.py
  • src/backend/core/migrations/0014_blob_encryption_key_id_blob_storage_location_and_more.py
  • src/backend/core/models.py
  • src/backend/core/services/search/search.py
  • src/backend/core/services/tiered_storage.py
  • src/backend/core/services/tiered_storage_tasks.py
  • src/backend/core/signals.py
  • src/backend/core/tests/commands/__init__.py
  • src/backend/core/tests/commands/test_verify_tiered_storage.py
  • src/backend/core/tests/conftest.py
  • src/backend/core/tests/services/__init__.py
  • src/backend/core/tests/services/test_tiered_storage.py
  • src/backend/core/tests/tasks/__init__.py
  • src/backend/core/tests/tasks/test_task_send_message.py
  • src/backend/core/tests/tasks/test_tiered_storage_tasks.py
  • src/backend/core/utils.py
  • src/backend/messages/celery_app.py
  • src/backend/messages/settings.py
🧰 Additional context used
📓 Path-based instructions (6)
src/backend/**/*.py

📄 CodeRabbit inference engine (.cursor/rules/django-python.mdc)

src/backend/**/*.py: Follow Django/PEP 8 style with a 100-character line limit
Use descriptive, snake_case names for variables and functions
Use Django ORM for database access; avoid raw SQL unless necessary for performance
Use Django’s built-in user model and authentication framework
Prefer try-except blocks to handle exceptions in business logic and views
Log expected and unexpected actions with appropriate log levels
Capture and report exceptions to Sentry; use capture_exception() for custom errors
Do not log sensitive information (tokens, passwords, financial/health data, PII)

Files:

  • src/backend/messages/celery_app.py
  • src/backend/core/tests/tasks/__init__.py
  • src/backend/core/api/viewsets/config.py
  • src/backend/core/tests/commands/test_verify_tiered_storage.py
  • src/backend/core/services/search/search.py
  • src/backend/core/management/commands/verify_tiered_storage.py
  • src/backend/core/utils.py
  • src/backend/core/signals.py
  • src/backend/core/services/tiered_storage_tasks.py
  • src/backend/core/tests/services/test_tiered_storage.py
  • src/backend/core/tests/tasks/test_task_send_message.py
  • src/backend/messages/settings.py
  • src/backend/core/tests/conftest.py
  • src/backend/core/models.py
  • src/backend/core/tests/services/__init__.py
  • src/backend/core/enums.py
  • src/backend/core/services/tiered_storage.py
  • src/backend/core/tests/commands/__init__.py
  • src/backend/core/tests/tasks/test_tiered_storage_tasks.py
  • src/backend/core/migrations/0014_blob_encryption_key_id_blob_storage_location_and_more.py
src/backend/**/{tests.py,tests/**/*.py}

📄 CodeRabbit inference engine (.cursor/rules/django-python.mdc)

src/backend/**/{tests.py,tests/**/*.py}: Use Django’s testing tools (pytest-django) to ensure code quality and reliability
Unit tests should focus on a single use case, keep assertions minimal, and cover all possible cases

Files:

  • src/backend/core/tests/tasks/__init__.py
  • src/backend/core/tests/commands/test_verify_tiered_storage.py
  • src/backend/core/tests/services/test_tiered_storage.py
  • src/backend/core/tests/tasks/test_task_send_message.py
  • src/backend/core/tests/conftest.py
  • src/backend/core/tests/services/__init__.py
  • src/backend/core/tests/commands/__init__.py
  • src/backend/core/tests/tasks/test_tiered_storage_tasks.py
src/backend/**/{settings.py,middleware.py}

📄 CodeRabbit inference engine (.cursor/rules/django-python.mdc)

Use middleware judiciously for cross-cutting concerns (authentication, logging, caching)

Files:

  • src/backend/messages/settings.py
src/backend/**/settings.py

📄 CodeRabbit inference engine (.cursor/rules/django-python.mdc)

src/backend/**/settings.py: Leverage Django’s caching framework (e.g., Redis/Memcached) where appropriate
Use Django’s cache framework with a backend like Redis or Memcached to reduce DB load
Optimize static file handling using Django’s staticfiles pipeline (e.g., WhiteNoise)

Files:

  • src/backend/messages/settings.py
src/backend/**/{models.py,forms.py,views.py}

📄 CodeRabbit inference engine (.cursor/rules/django-python.mdc)

Keep business logic in models and forms; keep views thin and focused on request handling

Files:

  • src/backend/core/models.py
src/backend/**/{models.py,migrations/**/*.py}

📄 CodeRabbit inference engine (.cursor/rules/django-python.mdc)

Implement database indexing and query optimization (Model Meta indexes, constraints)

Files:

  • src/backend/core/models.py
  • src/backend/core/migrations/0014_blob_encryption_key_id_blob_storage_location_and_more.py
🧠 Learnings (1)
📚 Learning: 2025-09-02T10:12:12.835Z
Learnt from: CR
Repo: suitenumerique/messages PR: 0
File: .cursor/rules/django-python.mdc:0-0
Timestamp: 2025-09-02T10:12:12.835Z
Learning: Applies to src/backend/**/{tests.py,tests/**/*.py} : Use Django’s testing tools (pytest-django) to ensure code quality and reliability

Applied to files:

  • src/backend/core/tests/conftest.py
  • src/backend/core/tests/commands/__init__.py
🧬 Code graph analysis (3)
src/backend/messages/settings.py (1)
src/backend/core/utils.py (1)
  • JSONValue (8-22)
src/backend/core/tests/conftest.py (1)
src/backend/core/services/tiered_storage.py (1)
  • storage (45-49)
src/backend/core/models.py (2)
src/backend/core/enums.py (2)
  • BlobStorageLocationChoices (61-65)
  • CompressionTypeChoices (54-58)
src/backend/core/services/tiered_storage.py (6)
  • TieredStorageService (28-296)
  • encrypt (68-91)
  • compute_storage_key (52-66)
  • decrypt (93-117)
  • download_blob (208-242)
  • delete_if_orphaned (244-280)
🪛 Ruff (0.14.11)
src/backend/core/tests/commands/test_verify_tiered_storage.py

31-31: import should be at the top-level of a file

(PLC0415)


180-180: import should be at the top-level of a file

(PLC0415)


208-208: import should be at the top-level of a file

(PLC0415)


275-275: import should be at the top-level of a file

(PLC0415)


357-357: import should be at the top-level of a file

(PLC0415)


397-397: import should be at the top-level of a file

(PLC0415)


415-415: import should be at the top-level of a file

(PLC0415)


442-442: import should be at the top-level of a file

(PLC0415)


470-470: import should be at the top-level of a file

(PLC0415)


506-506: import should be at the top-level of a file

(PLC0415)


552-552: import should be at the top-level of a file

(PLC0415)


596-596: import should be at the top-level of a file

(PLC0415)


630-630: import should be at the top-level of a file

(PLC0415)

src/backend/core/management/commands/verify_tiered_storage.py

280-280: import should be at the top-level of a file

(PLC0415)


282-282: import should be at the top-level of a file

(PLC0415)

src/backend/core/tests/services/test_tiered_storage.py

114-114: import should be at the top-level of a file

(PLC0415)


129-129: import should be at the top-level of a file

(PLC0415)


271-271: import should be at the top-level of a file

(PLC0415)


296-296: import should be at the top-level of a file

(PLC0415)


362-362: import should be at the top-level of a file

(PLC0415)


396-396: import should be at the top-level of a file

(PLC0415)


571-571: import should be at the top-level of a file

(PLC0415)


614-614: import should be at the top-level of a file

(PLC0415)


654-654: import should be at the top-level of a file

(PLC0415)

src/backend/core/tests/conftest.py

22-22: import should be at the top-level of a file

(PLC0415)

src/backend/core/models.py

1474-1474: import should be at the top-level of a file

(PLC0415)


1625-1625: import should be at the top-level of a file

(PLC0415)


1656-1656: import should be at the top-level of a file

(PLC0415)

src/backend/core/services/tiered_storage.py

132-132: import should be at the top-level of a file

(PLC0415)


149-149: import should be at the top-level of a file

(PLC0415)


257-257: import should be at the top-level of a file

(PLC0415)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: test-e2e
  • GitHub Check: lint-back
  • GitHub Check: build-front
  • GitHub Check: test-front
  • GitHub Check: check-api-state
  • GitHub Check: lint-front
  • GitHub Check: test-back
🔇 Additional comments (52)
src/backend/core/tests/tasks/__init__.py (1)

1-1: LGTM!

Clean package initializer for the tasks test module.

src/backend/core/tests/commands/__init__.py (1)

1-1: LGTM!

Appropriate package initializer for management command tests.

src/backend/core/tests/services/__init__.py (1)

1-1: LGTM!

Appropriate package initializer for service tests.

src/backend/core/tests/tasks/test_task_send_message.py (3)

3-3: LGTM!

Appropriate pylint disables for test file patterns: no-value-for-parameter for Celery task invocations without bound arguments, and unused-argument for pytest fixtures that establish state but aren't directly referenced.


45-45: Good hygiene.

Using _mailbox prefix correctly signals the variable is intentionally unused while still unpacking the fixture tuple.


99-99: Consistent with the pattern above.

src/backend/core/api/viewsets/config.py (1)

139-140: LGTM!

Using getattr(settings, setting, None) ensures consistent API response schema where all keys are always present, aligning with the OpenAPI specification that marks these fields as required. This is cleaner than conditional inclusion and provides predictable behavior for frontend consumers.

src/backend/core/utils.py (1)

14-22: LGTM!

Returning None for empty/whitespace strings allows django-configurations to fall back to default values, which is appropriate for optional JSON configuration like encryption keys.

env.d/development/backend.defaults (1)

54-66: LGTM!

The development defaults are well-documented with clear comments. Tiered storage is configured with a 3-day offload threshold and credentials follow the existing pattern for msg-imports.

compose.yaml (1)

84-84: LGTM!

The msg-blobs bucket creation follows the existing pattern. Correctly omits the ILM expiration rules since blobs are intended for long-term storage unlike temporary imports.

src/backend/messages/celery_app.py (1)

48-52: LGTM! The new beat schedule entry follows the existing pattern and an hourly interval is appropriate for blob offloading. The core.services.tiered_storage_tasks.offload_blobs_task is properly defined with the @celery_app.task(bind=True) decorator and has comprehensive test coverage.

src/backend/core/services/search/search.py (1)

40-42: No action required; direct access to settings.OPENSEARCH_INDEX_THREADS is safe.

The setting is properly defined in src/backend/messages/settings.py as a BooleanValue with a default of True. It will always exist at runtime, and direct access does not risk AttributeError. This pattern is already used consistently throughout the codebase in multiple other files (src/backend/core/signals.py, src/backend/core/services/search/tasks.py).

Likely an incorrect or invalid review comment.

src/backend/core/enums.py (1)

61-66: LGTM!

The new BlobStorageLocationChoices enum follows the existing patterns in this file with appropriate integer values and descriptive labels. Good placement in the logical order of the file.

src/backend/core/tests/conftest.py (1)

14-45: LGTM!

The session-scoped fixture properly defers Django imports and handles missing storage configuration gracefully. The broad exception handling is appropriate here to prevent test setup failures when object storage isn't configured. The pylint: disable comment at line 3 correctly covers the in-function import pattern.

src/backend/core/services/tiered_storage_tasks.py (1)

26-65: LGTM!

The task efficiently streams eligible blob IDs using values_list with iterator() to avoid memory pressure. The filtering criteria (age threshold + minimum size) appropriately limit the scope of each run.

src/backend/core/tests/commands/test_verify_tiered_storage.py (3)

1-7: LGTM!

Comprehensive test coverage for the verify_tiered_storage management command. The tests appropriately cover disabled states, E2E verification modes, hash verification with corruption detection, and re-encryption workflows. The in-function imports are intentionally used for test isolation and the pylint: disable comment at line 7 correctly covers this pattern.


78-80: Good cleanup pattern.

Consistent use of try/finally with existence checks ensures test isolation and prevents storage pollution across test runs.


468-534: Thorough E2E test for object storage re-encryption.

This test covers the complete workflow: encrypting with old key, uploading to storage, rotating keys, re-encrypting, and verifying content integrity via download and decompression.

src/backend/core/signals.py (1)

51-52: The concern is unfounded. OPENSEARCH_INDEX_THREADS is always defined in settings with a default value (True in src/backend/messages/settings.py:83-84 and False as a fallback at line 1090). Direct attribute access to settings.OPENSEARCH_INDEX_THREADS is safe and will not raise AttributeError. The change from getattr() to direct access is a valid simplification that removes unnecessary defensive programming.

Likely an incorrect or invalid review comment.

src/backend/core/migrations/0014_blob_encryption_key_id_blob_storage_location_and_more.py (1)

12-27: Migration structure looks correct.

The migration properly:

  • Adds storage_location with db_index=True for efficient filtering
  • Makes raw_content nullable to support object storage blobs
  • Uses hardcoded choices in migration (correct Django practice)

One consideration: if key rotation queries will frequently filter by encryption_key_id (e.g., finding all blobs encrypted with a specific key), adding an index on that field could be beneficial. However, this can be deferred based on actual query patterns.

src/backend/core/tests/services/test_tiered_storage.py (5)

24-181: Comprehensive unit test coverage for encryption/decryption.

The unit tests thoroughly cover:

  • Storage key computation with different SHA256 prefixes
  • Encryption passthrough when disabled (key_id=0)
  • Proper error handling for invalid/missing keys and corrupted data
  • Key rotation scenarios maintaining backward compatibility

Good separation of concerns with no DB or storage dependencies.


183-263: Good database-level test coverage.

Tests properly validate:

  • Default storage location behavior
  • Content retrieval from PostgreSQL
  • Error handling when content is missing
  • SHA256-based storage key derivation
  • Deduplication detection via check_already_uploaded

389-420: Critical regression test for double-encryption bug.

This test (test_offload_with_encryption_roundtrip) is valuable as it explicitly guards against the double-encryption bug mentioned in the docstring. The test verifies that encrypted blobs can be offloaded and read back correctly, which is a common failure point.


494-562: Important deduplication behavior documented in test.

The test correctly validates that when two blobs with identical content are encrypted with different keys, deduplication uses the first blob's encryption key_id. The inline comment at line 554 clarifies this is expected behavior until key rotation is complete.


565-667: Key rotation tests cover both storage locations.

The tests properly validate the re-encryption workflow for:

  • PostgreSQL-stored blobs (decrypt with old key, encrypt with new key, update in place)
  • Object storage blobs (download, decrypt, re-encrypt, upload, update metadata)

Both tests verify content integrity after rotation, which is critical.

src/backend/messages/settings.py (2)

234-259: New storage configuration follows existing patterns.

The message-blobs storage configuration mirrors the existing message-imports pattern. The default bucket name msg-blobs is provided.

One difference: endpoint_url has no default value, whereas some configurations might expect a default for local development. Verify this is intentional and that the development environment properly sets STORAGE_MESSAGE_BLOBS_ENDPOINT_URL.


376-398: Well-documented encryption and offload configuration.

The configuration properly:

  • Documents the key format and key_id=0 convention
  • Uses appropriate types (JSONValue for dict, PositiveIntegerValue for IDs)
  • Defaults to encryption disabled (key_id=0), requiring explicit opt-in
  • Allows fine-grained control over offload timing and size thresholds
src/backend/core/tests/tasks/test_tiered_storage_tasks.py (5)

32-48: Disabled state test properly mocks settings.

The test correctly mocks the storage configuration to verify the task gracefully handles the disabled state.


55-120: Good coverage of blob eligibility criteria.

The tests properly validate:

  • Age-based filtering using TIERED_STORAGE_OFFLOAD_AFTER_DAYS
  • Size-based filtering using TIERED_STORAGE_OFFLOAD_MIN_SIZE
  • The use of Blob.objects.filter().update() correctly bypasses any auto-update timestamps

The conditional assertion at line 119 (if settings.TIERED_STORAGE_OFFLOAD_MIN_SIZE > 0) adapts to the test environment settings, which is acceptable.


156-171: Consistent disabled state handling test.

Follows the same mocking pattern as the batch task test.


242-264: Good error handling test with rollback verification.

The test test_handles_upload_error properly verifies that:

  1. Upload errors are caught and reported
  2. The blob state is preserved (transaction rolled back)
  3. Storage location remains POSTGRES and raw_content is intact

This ensures data integrity during upload failures.


335-352: Idempotency test validates concurrent safety.

The test_concurrent_offload_idempotent test verifies that repeated offload attempts for the same blob are handled gracefully, returning already_offloaded on subsequent calls. This is important for Celery task retry scenarios.

src/backend/core/management/commands/verify_tiered_storage.py (6)

24-88: Well-designed CLI interface with clear modes.

The command provides:

  • Multiple verification modes (db-to-storage, storage-to-db, full)
  • Safety features (--dry-run, --limit)
  • Key rotation capability (--re-encrypt)
  • Proper early exit when storage is not configured

89-134: DB-to-storage verification handles scale well.

The method:

  • Uses iterator(chunk_size=1000) to avoid memory issues with large datasets
  • Reports missing blobs to stderr (appropriate severity for potential data loss)
  • Respects the --limit option for sampling

136-218: Storage-to-DB verification with orphan cleanup.

The method properly:

  • Validates storage path format before processing
  • Detects orphans (objects without DB references)
  • Optionally deletes orphans with --fix
  • Optionally verifies hashes (wisely marked as "slow")

Progress is reported every 100 objects (line 201), which provides good feedback during long operations.


220-254: Storage listing handles multiple backends.

The method:

  • Prefers direct boto3 access with pagination for efficiency
  • Falls back to Django's listdir for compatibility
  • Raises clear error when listing is not supported

256-305: Hash verification covers the complete pipeline.

The method correctly:

  • Downloads encrypted content from storage
  • Decrypts using the blob's encryption_key_id
  • Decompresses based on compression type
  • Computes and compares SHA256 hash

The broad exception handling is acceptable here since the goal is to report issues, not crash.


307-402: Re-encryption workflow with proper validation.

The method:

  • Validates encryption configuration before starting
  • Uses smaller chunk size (100) appropriate for the heavier re-encryption operation
  • Supports --dry-run for safe preview
  • Provides clear progress reporting and final summary
src/backend/core/models.py (6)

31-33: No review needed for this import change.


1472-1487: Encryption is integrated cleanly into blob creation.
Storing encrypted bytes and the key id alongside compression keeps retrieval consistent.


1595-1599: Conditional size_compressed update makes sense.
This avoids clobbering the stored size after offload clears raw_content.


1601-1609: No issues spotted.


1624-1643: Content retrieval flow reads cleanly.
The PostgreSQL vs object-storage branches and decryption/decompression steps are clear.


1645-1660: Deletion flow aligns with object-storage cleanup.

src/backend/core/services/tiered_storage.py (8)

1-24: No review needed for the module header/imports.


44-49: Lazy storage initialization looks good.


51-66: Storage key sharding is straightforward.


68-117: Encryption/decryption flow is consistent with key_id semantics.
Please double-check that configured key formats align with Fernet requirements.


119-155: DB-backed dedup lookups are clear.


208-243: Download path is clear and well-scoped.


282-296: Existence check is straightforward.


157-207: Return value is correctly handled in production code. The call site in src/backend/core/services/tiered_storage_tasks.py (line 113) properly captures the returned key_id and updates blob.encryption_key_id accordingly. Dedup-specific tests also confirm this pattern works correctly when the same content is uploaded with different encryption keys.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

Comment on lines 31 to 43
def __init__(self):
"""Initialize the service, checking if object storage is configured."""
# Check if message-blobs storage has endpoint_url configured
self._storage = None
self.enabled = bool(
settings.STORAGES.get("message-blobs", {})
.get("OPTIONS", {})
.get("endpoint_url")
)
# encryption_keys is a dict: {"1": "key1", "2": "key2"}
self.encryption_keys = settings.MESSAGES_BLOB_ENCRYPTION_KEYS or {}
self.active_key_id = settings.MESSAGES_BLOB_ENCRYPTION_ACTIVE_KEY_ID

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

rg -n "message-blobs" -C 5 src/backend/

Repository: suitenumerique/messages

Length of output: 9576


🏁 Script executed:

find src/backend -name "settings.py" -o -name "settings*.py" | head -20

Repository: suitenumerique/messages

Length of output: 99


🏁 Script executed:

rg -n "STORAGES\s*=" -C 3 src/backend/

Repository: suitenumerique/messages

Length of output: 4058


🏁 Script executed:

rg -n "endpoint_url" -C 3 src/backend/

Repository: suitenumerique/messages

Length of output: 5260


🏁 Script executed:

cat -n src/backend/core/services/tiered_storage.py | head -80

Repository: suitenumerique/messages

Length of output: 3066


🏁 Script executed:

cat -n src/backend/core/management/commands/verify_tiered_storage.py | head -85

Repository: suitenumerique/messages

Length of output: 3644


Fix enabled gate to check storage config presence, not endpoint_url.

Lines 35-39 gate on endpoint_url presence, which silently disables valid AWS S3 configurations (AWS S3 doesn't use endpoint_url; that field is optional for custom endpoints like MinIO). The condition should check for the presence of the message-blobs storage config itself.

Suggested fix
-        self.enabled = bool(
-            settings.STORAGES.get("message-blobs", {})
-            .get("OPTIONS", {})
-            .get("endpoint_url")
-        )
+        storage_cfg = settings.STORAGES.get("message-blobs")
+        self.enabled = bool(storage_cfg)
🤖 Prompt for AI Agents
In `@src/backend/core/services/tiered_storage.py` around lines 31 - 43, In
__init__, the enabled gate currently checks for an OPTIONS.endpoint_url which
wrongly disables valid S3 setups; instead set self.enabled based on presence of
the "message-blobs" storage config itself (e.g. check that settings.STORAGES
contains a non-empty "message-blobs" entry). Update the assignment to
self.enabled to use settings.STORAGES.get("message-blobs") (or "message-blobs"
in settings.STORAGES and truthy) rather than digging for OPTIONS.endpoint_url so
AWS S3 configs without endpoint_url remain enabled.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Fix all issues with AI agents
In `@src/backend/core/management/commands/verify_tiered_storage.py`:
- Around line 449-455: When re-encrypting Postgres blobs in
verify_tiered_storage.py (using self.service.decrypt and self.service.encrypt)
ensure you persist the recomputed size_compressed: after setting
blob.raw_content = encrypted and blob.encryption_key_id = new_key_id, call
blob.save(update_fields=["raw_content", "encryption_key_id", "size_compressed"])
so the recalculated size_compressed is written to the DB (match the approach
used in 0007_blob_size_compressed.py).
- Around line 280-282: Move the local imports of pyzstd and
CompressionTypeChoices out of the function and place them at the top of the
module alongside the existing BlobStorageLocationChoices import so they are
module-level imports; update the import section to include "import pyzstd" and
"from core.enums import CompressionTypeChoices" and remove the in-function
imports where pyzstd and CompressionTypeChoices are currently referenced in
verify_tiered_storage logic.

In `@src/backend/core/signals.py`:
- Around line 109-119: The post_delete signal handler cleanup_blob_storage
currently calls TieredStorageService().delete_if_orphaned() immediately, which
can run inside a transaction that may roll back; wrap the deletion call in
transaction.on_commit to defer it until after successful commit: import
django.db.transaction if needed, capture the instance.sha256 (or
bytes(instance.sha256)) and any required storage_location check inside
cleanup_blob_storage, create a small closure or lambda that calls
TieredStorageService().delete_if_orphaned(...) and register it with
transaction.on_commit, and ensure the existing guard checks
(BlobStorageLocationChoices.OBJECT_STORAGE and service.enabled) are preserved
before scheduling the on_commit callback.

In `@src/backend/core/tests/services/test_tiered_storage.py`:
- Line 114: Several intentional local test imports (e.g. "from
cryptography.fernet import InvalidToken" and the other test-only imports flagged
at the comment's listed locations) are meant to remain inside test functions;
add a trailing "# noqa: PLC0415" to each of those local import lines instead of
moving them to module level so Ruff/Pylint stops reporting
import-outside-toplevel while keeping the imports local to their tests. Ensure
you update each flagged import line (the ones referenced in the review) by
appending the exact comment "# noqa: PLC0415".
🧹 Nitpick comments (2)
src/backend/core/tests/services/test_tiered_storage.py (1)

265-563: Consider skipping E2E tests when object storage isn’t configured.

The docstring says “when available,” but these tests hard‑fail if storage isn’t configured. A skipif makes local/dev runs more resilient.

🔧 Suggested refactor
+# At module level
+_STORAGE_ENABLED = TieredStorageService().enabled
@@
-@pytest.mark.django_db
-class TestTieredStorageE2E:
+@pytest.mark.django_db
+@pytest.mark.skipif(
+    not _STORAGE_ENABLED, reason="Object storage not configured"
+)
+class TestTieredStorageE2E:
src/backend/core/tests/tasks/test_tiered_storage_tasks.py (1)

60-194: Consider skipping E2E task tests when object storage isn’t configured.

These E2E tests will fail in environments without MinIO. A skipif keeps dev/test runs resilient while still exercising coverage in CI.

🔧 Suggested refactor
+# At module level
+_STORAGE_ENABLED = TieredStorageService().enabled
@@
-@pytest.mark.django_db
-class TestOffloadBlobsTaskE2E:
+@pytest.mark.django_db
+@pytest.mark.skipif(
+    not _STORAGE_ENABLED, reason="Object storage not configured"
+)
+class TestOffloadBlobsTaskE2E:
@@
-@pytest.mark.django_db
-class TestOffloadSingleBlobTaskE2E:
+@pytest.mark.django_db
+@pytest.mark.skipif(
+    not _STORAGE_ENABLED, reason="Object storage not configured"
+)
+class TestOffloadSingleBlobTaskE2E:

Also applies to: 221-399

Comment on lines +280 to +282
import pyzstd

from core.enums import CompressionTypeChoices
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

cat -n src/backend/core/management/commands/verify_tiered_storage.py | sed -n '270,295p'

Repository: suitenumerique/messages

Length of output: 1214


🏁 Script executed:

head -n 50 src/backend/core/management/commands/verify_tiered_storage.py | cat -n

Repository: suitenumerique/messages

Length of output: 2125


🏁 Script executed:

rg "pyzstd|CompressionTypeChoices" src/backend/core/management/commands/verify_tiered_storage.py -n

Repository: suitenumerique/messages

Length of output: 286


🏁 Script executed:

cat -n src/backend/core/management/commands/verify_tiered_storage.py | sed -n '250,310p'

Repository: suitenumerique/messages

Length of output: 2598


🏁 Script executed:

rg "pyzstd" src/backend/core/management/commands/verify_tiered_storage.py -B 3 -A 3

Repository: suitenumerique/messages

Length of output: 449


🏁 Script executed:

rg "from core.enums import" src/backend/core/management/commands/verify_tiered_storage.py

Repository: suitenumerique/messages

Length of output: 174


🏁 Script executed:

fd "requirements" src/backend/ | head -20

Repository: suitenumerique/messages

Length of output: 49


🏁 Script executed:

find . -maxdepth 3 -type f \( -name "requirements*.txt" -o -name "setup.py" -o -name "pyproject.toml" -o -name "poetry.lock" -o -name "Pipfile" \) 2>/dev/null | head -20

Repository: suitenumerique/messages

Length of output: 229


🏁 Script executed:

cat src/backend/pyproject.toml | grep -A 100 "dependencies\|pyzstd"

Repository: suitenumerique/messages

Length of output: 3330


🏁 Script executed:

rg "CompressionTypeChoices" src/backend/core/management/commands/verify_tiered_storage.py -n

Repository: suitenumerique/messages

Length of output: 196


🏁 Script executed:

cat -n src/backend/core/management/commands/verify_tiered_storage.py | sed -n '1,30p'

Repository: suitenumerique/messages

Length of output: 1306


Move pyzstd and CompressionTypeChoices imports to module level.

Both imports are required dependencies and should follow PEP 8 conventions at the module level. CompressionTypeChoices already shares the same source as BlobStorageLocationChoices (which is imported at module level), and pyzstd is a direct project dependency. Keeping them local to the method is unnecessary and inconsistent with the codebase's import pattern.

🧰 Tools
🪛 Ruff (0.14.14)

280-280: import should be at the top-level of a file

(PLC0415)


282-282: import should be at the top-level of a file

(PLC0415)

🤖 Prompt for AI Agents
In `@src/backend/core/management/commands/verify_tiered_storage.py` around lines
280 - 282, Move the local imports of pyzstd and CompressionTypeChoices out of
the function and place them at the top of the module alongside the existing
BlobStorageLocationChoices import so they are module-level imports; update the
import section to include "import pyzstd" and "from core.enums import
CompressionTypeChoices" and remove the in-function imports where pyzstd and
CompressionTypeChoices are currently referenced in verify_tiered_storage logic.

Comment on lines +449 to +455
decrypted = self.service.decrypt(bytes(blob.raw_content), old_key_id)
encrypted, new_key_id = self.service.encrypt(decrypted)

blob.raw_content = encrypted
blob.encryption_key_id = new_key_id
blob.save(update_fields=["raw_content", "encryption_key_id"])

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

find . -name "*.py" -type f | xargs grep -l "class Blob" | head -20

Repository: suitenumerique/messages

Length of output: 261


🏁 Script executed:

cat -n src/backend/core/models.py | grep -A 50 "class Blob"

Repository: suitenumerique/messages

Length of output: 4181


🏁 Script executed:

grep -n "def save" src/backend/core/models.py | grep -A 20 "class Blob" | head -50

Repository: suitenumerique/messages

Length of output: 49


🏁 Script executed:

sed -n '1501,1700p' src/backend/core/models.py | grep -n "def save"

Repository: suitenumerique/messages

Length of output: 106


🏁 Script executed:

sed -n '1590,1650p' src/backend/core/models.py

Repository: suitenumerique/messages

Length of output: 2271


🏁 Script executed:

sed -n '440,465p' src/backend/core/management/commands/verify_tiered_storage.py

Repository: suitenumerique/messages

Length of output: 1022


🏁 Script executed:

rg "post_save.*Blob|receiver.*Blob" src/backend/core/ -A 5

Repository: suitenumerique/messages

Length of output: 528


🏁 Script executed:

rg "blob\.save\(update_fields" src/backend/ -B 3 -A 1

Repository: suitenumerique/messages

Length of output: 2036


🏁 Script executed:

rg "def encrypt|def decrypt" src/backend/core/services/ -A 10 | head -80

Repository: suitenumerique/messages

Length of output: 1695


Include size_compressed in the update set when re-encrypting Postgres blobs.

When blob.raw_content is changed to newly encrypted content, Blob.save() recalculates size_compressed from the new content length. However, using update_fields without size_compressed prevents this recalculated value from being persisted to the database, leaving the field stale. This is demonstrated in migration 0007_blob_size_compressed.py, which explicitly includes size_compressed in update_fields when modifying raw content.

🔧 Suggested fix
-                blob.save(update_fields=["raw_content", "encryption_key_id"])
+                blob.save(
+                    update_fields=["raw_content", "encryption_key_id", "size_compressed"]
+                )
🤖 Prompt for AI Agents
In `@src/backend/core/management/commands/verify_tiered_storage.py` around lines
449 - 455, When re-encrypting Postgres blobs in verify_tiered_storage.py (using
self.service.decrypt and self.service.encrypt) ensure you persist the recomputed
size_compressed: after setting blob.raw_content = encrypted and
blob.encryption_key_id = new_key_id, call
blob.save(update_fields=["raw_content", "encryption_key_id", "size_compressed"])
so the recalculated size_compressed is written to the DB (match the approach
used in 0007_blob_size_compressed.py).

Comment on lines +109 to +119
@receiver(post_delete, sender=models.Blob)
def cleanup_blob_storage(sender, instance, **kwargs):
"""Clean up object storage when a blob is deleted.

Uses a post_delete signal instead of a model delete() override to ensure
cleanup also runs during CASCADE and QuerySet bulk deletes.
"""
if instance.storage_location == BlobStorageLocationChoices.OBJECT_STORAGE:
service = TieredStorageService()
if service.enabled:
service.delete_if_orphaned(bytes(instance.sha256))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's look at the file structure and find the signals.py file
find src/backend -name "signals.py" -type f

Repository: suitenumerique/messages

Length of output: 94


🏁 Script executed:

# Read the signals.py file to see the full context and current imports
cat -n src/backend/core/signals.py | head -130

Repository: suitenumerique/messages

Length of output: 5232


🏁 Script executed:

# Check if transaction is already imported in the file
rg "from django.db import" src/backend/core/signals.py

Repository: suitenumerique/messages

Length of output: 49


🏁 Script executed:

# Check if transaction.on_commit is used elsewhere in the codebase
rg "transaction\.on_commit" src/backend --type py

Repository: suitenumerique/messages

Length of output: 49


🏁 Script executed:

# Check if there are other post_delete signal handlers using transaction patterns
rg "post_delete" src/backend --type py -A 10

Repository: suitenumerique/messages

Length of output: 6793


🌐 Web query:

Django post_delete signal transaction context behavior

💡 Result:

  • When post_delete fires: Django sends post_delete “at the end” of a model instance’s delete() and a queryset’s delete()—i.e., after the SQL DELETE for that object has been executed (within Django’s deletion flow). The instance is no longer in the DB at that point. [1]
  • Transaction context: post_delete is sent inside the same DB transaction used for the deletion (including the implicit atomic() Django uses around collected deletes). This is intentional so any DB writes done by signal receivers participate in the same transaction and will be committed/rolled back together. If the surrounding transaction later rolls back, the signal has already run, but all DB changes (the delete and receiver writes) roll back too. [2]
  • If you need “only after commit” behavior: For non-DB side effects (send email, enqueue Celery task, invalidate cache, call external API), don’t do them directly in post_delete. Register them with transaction.on_commit(...) so they run only if the transaction actually commits; otherwise they’re discarded. [2][3]
  • Bulk / cascades: QuerySet.delete() does not call each model’s delete() method, but it does emit pre_delete/post_delete for each deleted object, including cascaded deletions. [1][4]

Sources: [1] Django signals docs (post_delete) https://docs.djangoproject.com/en/2.0/ref/signals/ ; [2] Django ticket discussion explaining post_delete is inside the delete transaction and recommending on_commit https://code.djangoproject.com/ticket/30835 ; [3] Django transactions docs (transaction.on_commit) https://docs.djangoproject.com/en/3.2/topics/db/transactions/ ; [4] StackOverflow quoting Django docs on bulk delete and signals https://stackoverflow.com/questions/28896237/override-djangos-model-delete-method-for-bulk-deletion


Defer object-storage deletion until the DB commit.

post_delete runs inside the surrounding transaction. If it rolls back, the S3 object deletion has already completed, leaving an orphaned object while the DB row is restored. Use transaction.on_commit() to ensure cleanup only happens after a successful commit.

🔧 Suggested fix
-from django.db.models.signals import post_delete, post_save, pre_delete
+from django.db import transaction
+from django.db.models.signals import post_delete, post_save, pre_delete
@@
 def cleanup_blob_storage(sender, instance, **kwargs):
@@
-    if instance.storage_location == BlobStorageLocationChoices.OBJECT_STORAGE:
-        service = TieredStorageService()
-        if service.enabled:
-            service.delete_if_orphaned(bytes(instance.sha256))
+    if instance.storage_location == BlobStorageLocationChoices.OBJECT_STORAGE:
+        service = TieredStorageService()
+        if service.enabled:
+            sha256 = bytes(instance.sha256)
+            transaction.on_commit(
+                lambda: service.delete_if_orphaned(sha256)
+            )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
@receiver(post_delete, sender=models.Blob)
def cleanup_blob_storage(sender, instance, **kwargs):
"""Clean up object storage when a blob is deleted.
Uses a post_delete signal instead of a model delete() override to ensure
cleanup also runs during CASCADE and QuerySet bulk deletes.
"""
if instance.storage_location == BlobStorageLocationChoices.OBJECT_STORAGE:
service = TieredStorageService()
if service.enabled:
service.delete_if_orphaned(bytes(instance.sha256))
from django.db import transaction
from django.db.models.signals import post_delete, post_save, pre_delete
`@receiver`(post_delete, sender=models.Blob)
def cleanup_blob_storage(sender, instance, **kwargs):
"""Clean up object storage when a blob is deleted.
Uses a post_delete signal instead of a model delete() override to ensure
cleanup also runs during CASCADE and QuerySet bulk deletes.
"""
if instance.storage_location == BlobStorageLocationChoices.OBJECT_STORAGE:
service = TieredStorageService()
if service.enabled:
sha256 = bytes(instance.sha256)
transaction.on_commit(
lambda: service.delete_if_orphaned(sha256)
)
🤖 Prompt for AI Agents
In `@src/backend/core/signals.py` around lines 109 - 119, The post_delete signal
handler cleanup_blob_storage currently calls
TieredStorageService().delete_if_orphaned() immediately, which can run inside a
transaction that may roll back; wrap the deletion call in transaction.on_commit
to defer it until after successful commit: import django.db.transaction if
needed, capture the instance.sha256 (or bytes(instance.sha256)) and any required
storage_location check inside cleanup_blob_storage, create a small closure or
lambda that calls TieredStorageService().delete_if_orphaned(...) and register it
with transaction.on_commit, and ensure the existing guard checks
(BlobStorageLocationChoices.OBJECT_STORAGE and service.enabled) are preserved
before scheduling the on_commit callback.


def test_decrypt_with_corrupted_data(self):
"""Test that decrypt fails gracefully with corrupted Fernet data."""
from cryptography.fernet import InvalidToken
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# First, check if the file exists and examine the flagged lines
cd /tmp && find . -name "test_tiered_storage.py" 2>/dev/null | head -5

Repository: suitenumerique/messages

Length of output: 49


🏁 Script executed:

# Get the file stats and examine specific lines around the flagged imports
wc -l src/backend/core/tests/services/test_tiered_storage.py
head -130 src/backend/core/tests/services/test_tiered_storage.py | tail -30

Repository: suitenumerique/messages

Length of output: 1290


🏁 Script executed:

# Examine all flagged lines with context
sed -n '110,120p' src/backend/core/tests/services/test_tiered_storage.py
sed -n '125,135p' src/backend/core/tests/services/test_tiered_storage.py
sed -n '267,277p' src/backend/core/tests/services/test_tiered_storage.py
sed -n '292,302p' src/backend/core/tests/services/test_tiered_storage.py
sed -n '358,368p' src/backend/core/tests/services/test_tiered_storage.py
sed -n '392,402p' src/backend/core/tests/services/test_tiered_storage.py
sed -n '592,602p' src/backend/core/tests/services/test_tiered_storage.py
sed -n '626,636p' src/backend/core/tests/services/test_tiered_storage.py
sed -n '669,679p' src/backend/core/tests/services/test_tiered_storage.py
sed -n '709,719p' src/backend/core/tests/services/test_tiered_storage.py

Repository: suitenumerique/messages

Length of output: 4381


🏁 Script executed:

# Check the full top-level imports at the beginning of the file
head -50 src/backend/core/tests/services/test_tiered_storage.py

Repository: suitenumerique/messages

Length of output: 1831


Add # noqa: PLC0415 markers to intentional local imports instead of moving them to module level.

The file already disables import-outside-toplevel via pylint, indicating these are intentional. In test files, local imports are a standard practice to keep test methods focused on their specific dependencies and avoid polluting the module namespace. Rather than moving these to the top-level, add # noqa: PLC0415 to each flagged import (lines 114, 129, 271, 296, 362, 396, 596, 630, 673, 713) to suppress the Ruff warning while preserving the intended structure.

🧰 Tools
🪛 Ruff (0.14.14)

114-114: import should be at the top-level of a file

(PLC0415)

🤖 Prompt for AI Agents
In `@src/backend/core/tests/services/test_tiered_storage.py` at line 114, Several
intentional local test imports (e.g. "from cryptography.fernet import
InvalidToken" and the other test-only imports flagged at the comment's listed
locations) are meant to remain inside test functions; add a trailing "# noqa:
PLC0415" to each of those local import lines instead of moving them to module
level so Ruff/Pylint stops reporting import-outside-toplevel while keeping the
imports local to their tests. Ensure you update each flagged import line (the
ones referenced in the review) by appending the exact comment "# noqa: PLC0415".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Scalable email storage

2 participants