-
Notifications
You must be signed in to change notification settings - Fork 14
✨(storage) implement tiered storage #486
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This allows to use S3-compatible object storage to offload blobs, making Postgres much lighter. We design for storing ~1B emails on a single instance.
📝 WalkthroughWalkthroughImplements tiered blob storage with optional encryption and offload to object storage. Adds Blob storage metadata, a TieredStorageService, Celery offload tasks, verification/rotation management command, settings and env defaults, tests, and Docker Compose bucket initialization. Changes
Sequence DiagramsequenceDiagram
actor App as Application
participant Blob
participant Tiered as TieredStorageService
participant Object as Object Storage
participant DB as PostgreSQL
rect rgba(76, 175, 80, 0.5)
Note over App,DB: Blob creation (encrypt & store)
App->>Blob: create_blob(compressed_content)
Blob->>Tiered: encrypt(data)
Tiered-->>Blob: encrypted_bytes, key_id
Blob->>DB: save(encrypted_bytes, key_id, storage=POSTGRES)
DB-->>Blob: blob_id
end
rect rgba(33, 150, 243, 0.5)
Note over App,Object: Offload to object storage
App->>Tiered: upload_blob(blob)
Tiered->>Object: PUT /blobs/... (encrypted)
Object-->>Tiered: success
Tiered->>DB: update(blob.storage_location=OBJECT_STORAGE, raw_content=NULL)
DB-->>Tiered: ✓
end
rect rgba(244, 67, 54, 0.5)
Note over App,DB: Retrieval (transparent)
App->>Blob: get_content()
alt storage == POSTGRES
Blob->>Tiered: decrypt(raw_content, key_id)
Tiered-->>Blob: decrypted_bytes
else storage == OBJECT_STORAGE
Blob->>Object: GET /blobs/...
Object-->>Blob: encrypted_bytes
Blob->>Tiered: decrypt(encrypted_bytes, key_id)
Tiered-->>Blob: decrypted_bytes
end
Blob->>Blob: decompress(...)
Blob-->>App: original content
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@src/backend/core/management/commands/verify_tiered_storage.py`:
- Around line 466-495: The current flow writes the newly encrypted object via
self.service.storage.save(storage_key, ...) before updating
blob.encryption_key_id inside transaction.atomic(), risking storage/DB
inconsistency if the DB update fails; instead, write the new encrypted bytes to
a temporary object (e.g. derive a temp key from storage_key and new_key_id)
using self.service.storage.save(temp_key, ContentFile(encrypted)), then perform
the DB update inside transaction.atomic() (update blob.encryption_key_id and
save), and only after the transaction succeeds atomically remove/rename the temp
object to the final storage_key (or copy temp→final and delete temp) so storage
and DB remain consistent; reference symbols: self.service.storage.save,
storage_key, temp_key (create), self.service.encrypt, blob.encryption_key_id,
transaction.atomic.
In `@src/backend/core/services/tiered_storage.py`:
- Around line 31-43: In __init__, the enabled gate currently checks for an
OPTIONS.endpoint_url which wrongly disables valid S3 setups; instead set
self.enabled based on presence of the "message-blobs" storage config itself
(e.g. check that settings.STORAGES contains a non-empty "message-blobs" entry).
Update the assignment to self.enabled to use
settings.STORAGES.get("message-blobs") (or "message-blobs" in settings.STORAGES
and truthy) rather than digging for OPTIONS.endpoint_url so AWS S3 configs
without endpoint_url remain enabled.
🧹 Nitpick comments (3)
src/backend/core/services/tiered_storage_tasks.py (1)
68-133: Consider adding retry for transient failures.The task handles lock contention gracefully by returning "locked" status, but transient failures (network issues, temporary S3 unavailability) at line 131 are logged and returned as errors without retry. The periodic
offload_blobs_taskwill eventually re-queue these blobs, but adding explicit retry behavior for transient exceptions (e.g.,ConnectionError,Timeout) could improve reliability.💡 Optional: Add retry for transient failures
-@celery_app.task(bind=True) +@celery_app.task(bind=True, autoretry_for=(ConnectionError, TimeoutError), retry_backoff=True, max_retries=3) def offload_single_blob_task(self, blob_id: str) -> Dict[str, Any]:src/backend/core/models.py (1)
1536-1557: Enforce storage_location/raw_content invariants at the DB layer.
Withraw_contentnow nullable, inconsistent states (e.g., OBJECT_STORAGE + non-null content)
become possible and will surface as runtime errors inget_content. A check constraint makes
the invariant explicit and avoids silent drift. This will require a migration.♻️ Proposed constraint
constraints = [ models.CheckConstraint( check=( models.Q(mailbox__isnull=False) | models.Q(maildomain__isnull=False) ), name="blob_has_owner", ), + models.CheckConstraint( + check=( + models.Q( + storage_location=BlobStorageLocationChoices.POSTGRES, + raw_content__isnull=False, + ) + | models.Q( + storage_location=BlobStorageLocationChoices.OBJECT_STORAGE, + raw_content__isnull=True, + ) + ), + name="blob_storage_location_matches_content", + ), ]As per coding guidelines, enforce data integrity with model constraints.
Also applies to: 1583-1589
src/backend/core/services/tiered_storage.py (1)
244-281: Guard against orphan-delete races and capture delete errors.
There’s a TOCTOU window between the reference count (Line 259-263) and deletion (Line 274-275);
a concurrent offload could add a reference after the count and still have its object deleted.
Consider an advisory lock keyed by SHA256 or a transactional guard around the check+delete.Also, capture the storage deletion exception to Sentry so cleanup failures are observable.
♻️ Suggested Sentry capture
from cryptography.fernet import Fernet +from sentry_sdk import capture_exception @@ - except Exception as e: # pylint: disable=broad-except - logger.warning("Failed to delete blob from storage %s: %s", key, e) + except Exception as exc: # pylint: disable=broad-except + capture_exception(exc) + logger.warning("Failed to delete blob from storage %s: %s", key, exc) return FalseAs per coding guidelines, capture and report exceptions to Sentry.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (22)
compose.yamlenv.d/development/backend.defaultssrc/backend/core/api/viewsets/config.pysrc/backend/core/enums.pysrc/backend/core/management/commands/verify_tiered_storage.pysrc/backend/core/migrations/0014_blob_encryption_key_id_blob_storage_location_and_more.pysrc/backend/core/models.pysrc/backend/core/services/search/search.pysrc/backend/core/services/tiered_storage.pysrc/backend/core/services/tiered_storage_tasks.pysrc/backend/core/signals.pysrc/backend/core/tests/commands/__init__.pysrc/backend/core/tests/commands/test_verify_tiered_storage.pysrc/backend/core/tests/conftest.pysrc/backend/core/tests/services/__init__.pysrc/backend/core/tests/services/test_tiered_storage.pysrc/backend/core/tests/tasks/__init__.pysrc/backend/core/tests/tasks/test_task_send_message.pysrc/backend/core/tests/tasks/test_tiered_storage_tasks.pysrc/backend/core/utils.pysrc/backend/messages/celery_app.pysrc/backend/messages/settings.py
🧰 Additional context used
📓 Path-based instructions (6)
src/backend/**/*.py
📄 CodeRabbit inference engine (.cursor/rules/django-python.mdc)
src/backend/**/*.py: Follow Django/PEP 8 style with a 100-character line limit
Use descriptive, snake_case names for variables and functions
Use Django ORM for database access; avoid raw SQL unless necessary for performance
Use Django’s built-in user model and authentication framework
Prefer try-except blocks to handle exceptions in business logic and views
Log expected and unexpected actions with appropriate log levels
Capture and report exceptions to Sentry; use capture_exception() for custom errors
Do not log sensitive information (tokens, passwords, financial/health data, PII)
Files:
src/backend/messages/celery_app.pysrc/backend/core/tests/tasks/__init__.pysrc/backend/core/api/viewsets/config.pysrc/backend/core/tests/commands/test_verify_tiered_storage.pysrc/backend/core/services/search/search.pysrc/backend/core/management/commands/verify_tiered_storage.pysrc/backend/core/utils.pysrc/backend/core/signals.pysrc/backend/core/services/tiered_storage_tasks.pysrc/backend/core/tests/services/test_tiered_storage.pysrc/backend/core/tests/tasks/test_task_send_message.pysrc/backend/messages/settings.pysrc/backend/core/tests/conftest.pysrc/backend/core/models.pysrc/backend/core/tests/services/__init__.pysrc/backend/core/enums.pysrc/backend/core/services/tiered_storage.pysrc/backend/core/tests/commands/__init__.pysrc/backend/core/tests/tasks/test_tiered_storage_tasks.pysrc/backend/core/migrations/0014_blob_encryption_key_id_blob_storage_location_and_more.py
src/backend/**/{tests.py,tests/**/*.py}
📄 CodeRabbit inference engine (.cursor/rules/django-python.mdc)
src/backend/**/{tests.py,tests/**/*.py}: Use Django’s testing tools (pytest-django) to ensure code quality and reliability
Unit tests should focus on a single use case, keep assertions minimal, and cover all possible cases
Files:
src/backend/core/tests/tasks/__init__.pysrc/backend/core/tests/commands/test_verify_tiered_storage.pysrc/backend/core/tests/services/test_tiered_storage.pysrc/backend/core/tests/tasks/test_task_send_message.pysrc/backend/core/tests/conftest.pysrc/backend/core/tests/services/__init__.pysrc/backend/core/tests/commands/__init__.pysrc/backend/core/tests/tasks/test_tiered_storage_tasks.py
src/backend/**/{settings.py,middleware.py}
📄 CodeRabbit inference engine (.cursor/rules/django-python.mdc)
Use middleware judiciously for cross-cutting concerns (authentication, logging, caching)
Files:
src/backend/messages/settings.py
src/backend/**/settings.py
📄 CodeRabbit inference engine (.cursor/rules/django-python.mdc)
src/backend/**/settings.py: Leverage Django’s caching framework (e.g., Redis/Memcached) where appropriate
Use Django’s cache framework with a backend like Redis or Memcached to reduce DB load
Optimize static file handling using Django’s staticfiles pipeline (e.g., WhiteNoise)
Files:
src/backend/messages/settings.py
src/backend/**/{models.py,forms.py,views.py}
📄 CodeRabbit inference engine (.cursor/rules/django-python.mdc)
Keep business logic in models and forms; keep views thin and focused on request handling
Files:
src/backend/core/models.py
src/backend/**/{models.py,migrations/**/*.py}
📄 CodeRabbit inference engine (.cursor/rules/django-python.mdc)
Implement database indexing and query optimization (Model Meta indexes, constraints)
Files:
src/backend/core/models.pysrc/backend/core/migrations/0014_blob_encryption_key_id_blob_storage_location_and_more.py
🧠 Learnings (1)
📚 Learning: 2025-09-02T10:12:12.835Z
Learnt from: CR
Repo: suitenumerique/messages PR: 0
File: .cursor/rules/django-python.mdc:0-0
Timestamp: 2025-09-02T10:12:12.835Z
Learning: Applies to src/backend/**/{tests.py,tests/**/*.py} : Use Django’s testing tools (pytest-django) to ensure code quality and reliability
Applied to files:
src/backend/core/tests/conftest.pysrc/backend/core/tests/commands/__init__.py
🧬 Code graph analysis (3)
src/backend/messages/settings.py (1)
src/backend/core/utils.py (1)
JSONValue(8-22)
src/backend/core/tests/conftest.py (1)
src/backend/core/services/tiered_storage.py (1)
storage(45-49)
src/backend/core/models.py (2)
src/backend/core/enums.py (2)
BlobStorageLocationChoices(61-65)CompressionTypeChoices(54-58)src/backend/core/services/tiered_storage.py (6)
TieredStorageService(28-296)encrypt(68-91)compute_storage_key(52-66)decrypt(93-117)download_blob(208-242)delete_if_orphaned(244-280)
🪛 Ruff (0.14.11)
src/backend/core/tests/commands/test_verify_tiered_storage.py
31-31: import should be at the top-level of a file
(PLC0415)
180-180: import should be at the top-level of a file
(PLC0415)
208-208: import should be at the top-level of a file
(PLC0415)
275-275: import should be at the top-level of a file
(PLC0415)
357-357: import should be at the top-level of a file
(PLC0415)
397-397: import should be at the top-level of a file
(PLC0415)
415-415: import should be at the top-level of a file
(PLC0415)
442-442: import should be at the top-level of a file
(PLC0415)
470-470: import should be at the top-level of a file
(PLC0415)
506-506: import should be at the top-level of a file
(PLC0415)
552-552: import should be at the top-level of a file
(PLC0415)
596-596: import should be at the top-level of a file
(PLC0415)
630-630: import should be at the top-level of a file
(PLC0415)
src/backend/core/management/commands/verify_tiered_storage.py
280-280: import should be at the top-level of a file
(PLC0415)
282-282: import should be at the top-level of a file
(PLC0415)
src/backend/core/tests/services/test_tiered_storage.py
114-114: import should be at the top-level of a file
(PLC0415)
129-129: import should be at the top-level of a file
(PLC0415)
271-271: import should be at the top-level of a file
(PLC0415)
296-296: import should be at the top-level of a file
(PLC0415)
362-362: import should be at the top-level of a file
(PLC0415)
396-396: import should be at the top-level of a file
(PLC0415)
571-571: import should be at the top-level of a file
(PLC0415)
614-614: import should be at the top-level of a file
(PLC0415)
654-654: import should be at the top-level of a file
(PLC0415)
src/backend/core/tests/conftest.py
22-22: import should be at the top-level of a file
(PLC0415)
src/backend/core/models.py
1474-1474: import should be at the top-level of a file
(PLC0415)
1625-1625: import should be at the top-level of a file
(PLC0415)
1656-1656: import should be at the top-level of a file
(PLC0415)
src/backend/core/services/tiered_storage.py
132-132: import should be at the top-level of a file
(PLC0415)
149-149: import should be at the top-level of a file
(PLC0415)
257-257: import should be at the top-level of a file
(PLC0415)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: test-e2e
- GitHub Check: lint-back
- GitHub Check: build-front
- GitHub Check: test-front
- GitHub Check: check-api-state
- GitHub Check: lint-front
- GitHub Check: test-back
🔇 Additional comments (52)
src/backend/core/tests/tasks/__init__.py (1)
1-1: LGTM!Clean package initializer for the tasks test module.
src/backend/core/tests/commands/__init__.py (1)
1-1: LGTM!Appropriate package initializer for management command tests.
src/backend/core/tests/services/__init__.py (1)
1-1: LGTM!Appropriate package initializer for service tests.
src/backend/core/tests/tasks/test_task_send_message.py (3)
3-3: LGTM!Appropriate pylint disables for test file patterns:
no-value-for-parameterfor Celery task invocations without bound arguments, andunused-argumentfor pytest fixtures that establish state but aren't directly referenced.
45-45: Good hygiene.Using
_mailboxprefix correctly signals the variable is intentionally unused while still unpacking the fixture tuple.
99-99: Consistent with the pattern above.src/backend/core/api/viewsets/config.py (1)
139-140: LGTM!Using
getattr(settings, setting, None)ensures consistent API response schema where all keys are always present, aligning with the OpenAPI specification that marks these fields as required. This is cleaner than conditional inclusion and provides predictable behavior for frontend consumers.src/backend/core/utils.py (1)
14-22: LGTM!Returning
Nonefor empty/whitespace strings allows django-configurations to fall back to default values, which is appropriate for optional JSON configuration like encryption keys.env.d/development/backend.defaults (1)
54-66: LGTM!The development defaults are well-documented with clear comments. Tiered storage is configured with a 3-day offload threshold and credentials follow the existing pattern for
msg-imports.compose.yaml (1)
84-84: LGTM!The
msg-blobsbucket creation follows the existing pattern. Correctly omits the ILM expiration rules since blobs are intended for long-term storage unlike temporary imports.src/backend/messages/celery_app.py (1)
48-52: LGTM! The new beat schedule entry follows the existing pattern and an hourly interval is appropriate for blob offloading. Thecore.services.tiered_storage_tasks.offload_blobs_taskis properly defined with the@celery_app.task(bind=True)decorator and has comprehensive test coverage.src/backend/core/services/search/search.py (1)
40-42: No action required; direct access tosettings.OPENSEARCH_INDEX_THREADSis safe.The setting is properly defined in
src/backend/messages/settings.pyas aBooleanValuewith a default ofTrue. It will always exist at runtime, and direct access does not riskAttributeError. This pattern is already used consistently throughout the codebase in multiple other files (src/backend/core/signals.py,src/backend/core/services/search/tasks.py).Likely an incorrect or invalid review comment.
src/backend/core/enums.py (1)
61-66: LGTM!The new
BlobStorageLocationChoicesenum follows the existing patterns in this file with appropriate integer values and descriptive labels. Good placement in the logical order of the file.src/backend/core/tests/conftest.py (1)
14-45: LGTM!The session-scoped fixture properly defers Django imports and handles missing storage configuration gracefully. The broad exception handling is appropriate here to prevent test setup failures when object storage isn't configured. The
pylint: disablecomment at line 3 correctly covers the in-function import pattern.src/backend/core/services/tiered_storage_tasks.py (1)
26-65: LGTM!The task efficiently streams eligible blob IDs using
values_listwithiterator()to avoid memory pressure. The filtering criteria (age threshold + minimum size) appropriately limit the scope of each run.src/backend/core/tests/commands/test_verify_tiered_storage.py (3)
1-7: LGTM!Comprehensive test coverage for the
verify_tiered_storagemanagement command. The tests appropriately cover disabled states, E2E verification modes, hash verification with corruption detection, and re-encryption workflows. The in-function imports are intentionally used for test isolation and thepylint: disablecomment at line 7 correctly covers this pattern.
78-80: Good cleanup pattern.Consistent use of
try/finallywith existence checks ensures test isolation and prevents storage pollution across test runs.
468-534: Thorough E2E test for object storage re-encryption.This test covers the complete workflow: encrypting with old key, uploading to storage, rotating keys, re-encrypting, and verifying content integrity via download and decompression.
src/backend/core/signals.py (1)
51-52: The concern is unfounded.OPENSEARCH_INDEX_THREADSis always defined in settings with a default value (Trueinsrc/backend/messages/settings.py:83-84andFalseas a fallback at line 1090). Direct attribute access tosettings.OPENSEARCH_INDEX_THREADSis safe and will not raiseAttributeError. The change fromgetattr()to direct access is a valid simplification that removes unnecessary defensive programming.Likely an incorrect or invalid review comment.
src/backend/core/migrations/0014_blob_encryption_key_id_blob_storage_location_and_more.py (1)
12-27: Migration structure looks correct.The migration properly:
- Adds
storage_locationwithdb_index=Truefor efficient filtering- Makes
raw_contentnullable to support object storage blobs- Uses hardcoded choices in migration (correct Django practice)
One consideration: if key rotation queries will frequently filter by
encryption_key_id(e.g., finding all blobs encrypted with a specific key), adding an index on that field could be beneficial. However, this can be deferred based on actual query patterns.src/backend/core/tests/services/test_tiered_storage.py (5)
24-181: Comprehensive unit test coverage for encryption/decryption.The unit tests thoroughly cover:
- Storage key computation with different SHA256 prefixes
- Encryption passthrough when disabled (key_id=0)
- Proper error handling for invalid/missing keys and corrupted data
- Key rotation scenarios maintaining backward compatibility
Good separation of concerns with no DB or storage dependencies.
183-263: Good database-level test coverage.Tests properly validate:
- Default storage location behavior
- Content retrieval from PostgreSQL
- Error handling when content is missing
- SHA256-based storage key derivation
- Deduplication detection via
check_already_uploaded
389-420: Critical regression test for double-encryption bug.This test (
test_offload_with_encryption_roundtrip) is valuable as it explicitly guards against the double-encryption bug mentioned in the docstring. The test verifies that encrypted blobs can be offloaded and read back correctly, which is a common failure point.
494-562: Important deduplication behavior documented in test.The test correctly validates that when two blobs with identical content are encrypted with different keys, deduplication uses the first blob's encryption key_id. The inline comment at line 554 clarifies this is expected behavior until key rotation is complete.
565-667: Key rotation tests cover both storage locations.The tests properly validate the re-encryption workflow for:
- PostgreSQL-stored blobs (decrypt with old key, encrypt with new key, update in place)
- Object storage blobs (download, decrypt, re-encrypt, upload, update metadata)
Both tests verify content integrity after rotation, which is critical.
src/backend/messages/settings.py (2)
234-259: New storage configuration follows existing patterns.The
message-blobsstorage configuration mirrors the existingmessage-importspattern. The default bucket namemsg-blobsis provided.One difference:
endpoint_urlhas no default value, whereas some configurations might expect a default for local development. Verify this is intentional and that the development environment properly setsSTORAGE_MESSAGE_BLOBS_ENDPOINT_URL.
376-398: Well-documented encryption and offload configuration.The configuration properly:
- Documents the key format and key_id=0 convention
- Uses appropriate types (
JSONValuefor dict,PositiveIntegerValuefor IDs)- Defaults to encryption disabled (key_id=0), requiring explicit opt-in
- Allows fine-grained control over offload timing and size thresholds
src/backend/core/tests/tasks/test_tiered_storage_tasks.py (5)
32-48: Disabled state test properly mocks settings.The test correctly mocks the storage configuration to verify the task gracefully handles the disabled state.
55-120: Good coverage of blob eligibility criteria.The tests properly validate:
- Age-based filtering using
TIERED_STORAGE_OFFLOAD_AFTER_DAYS- Size-based filtering using
TIERED_STORAGE_OFFLOAD_MIN_SIZE- The use of
Blob.objects.filter().update()correctly bypasses any auto-update timestampsThe conditional assertion at line 119 (
if settings.TIERED_STORAGE_OFFLOAD_MIN_SIZE > 0) adapts to the test environment settings, which is acceptable.
156-171: Consistent disabled state handling test.Follows the same mocking pattern as the batch task test.
242-264: Good error handling test with rollback verification.The test
test_handles_upload_errorproperly verifies that:
- Upload errors are caught and reported
- The blob state is preserved (transaction rolled back)
- Storage location remains
POSTGRESandraw_contentis intactThis ensures data integrity during upload failures.
335-352: Idempotency test validates concurrent safety.The
test_concurrent_offload_idempotenttest verifies that repeated offload attempts for the same blob are handled gracefully, returningalready_offloadedon subsequent calls. This is important for Celery task retry scenarios.src/backend/core/management/commands/verify_tiered_storage.py (6)
24-88: Well-designed CLI interface with clear modes.The command provides:
- Multiple verification modes (
db-to-storage,storage-to-db,full)- Safety features (
--dry-run,--limit)- Key rotation capability (
--re-encrypt)- Proper early exit when storage is not configured
89-134: DB-to-storage verification handles scale well.The method:
- Uses
iterator(chunk_size=1000)to avoid memory issues with large datasets- Reports missing blobs to stderr (appropriate severity for potential data loss)
- Respects the
--limitoption for sampling
136-218: Storage-to-DB verification with orphan cleanup.The method properly:
- Validates storage path format before processing
- Detects orphans (objects without DB references)
- Optionally deletes orphans with
--fix- Optionally verifies hashes (wisely marked as "slow")
Progress is reported every 100 objects (line 201), which provides good feedback during long operations.
220-254: Storage listing handles multiple backends.The method:
- Prefers direct boto3 access with pagination for efficiency
- Falls back to Django's
listdirfor compatibility- Raises clear error when listing is not supported
256-305: Hash verification covers the complete pipeline.The method correctly:
- Downloads encrypted content from storage
- Decrypts using the blob's
encryption_key_id- Decompresses based on
compressiontype- Computes and compares SHA256 hash
The broad exception handling is acceptable here since the goal is to report issues, not crash.
307-402: Re-encryption workflow with proper validation.The method:
- Validates encryption configuration before starting
- Uses smaller chunk size (100) appropriate for the heavier re-encryption operation
- Supports
--dry-runfor safe preview- Provides clear progress reporting and final summary
src/backend/core/models.py (6)
31-33: No review needed for this import change.
1472-1487: Encryption is integrated cleanly into blob creation.
Storing encrypted bytes and the key id alongside compression keeps retrieval consistent.
1595-1599: Conditionalsize_compressedupdate makes sense.
This avoids clobbering the stored size after offload clearsraw_content.
1601-1609: No issues spotted.
1624-1643: Content retrieval flow reads cleanly.
The PostgreSQL vs object-storage branches and decryption/decompression steps are clear.
1645-1660: Deletion flow aligns with object-storage cleanup.src/backend/core/services/tiered_storage.py (8)
1-24: No review needed for the module header/imports.
44-49: Lazy storage initialization looks good.
51-66: Storage key sharding is straightforward.
68-117: Encryption/decryption flow is consistent with key_id semantics.
Please double-check that configured key formats align with Fernet requirements.
119-155: DB-backed dedup lookups are clear.
208-243: Download path is clear and well-scoped.
282-296: Existence check is straightforward.
157-207: Return value is correctly handled in production code. The call site insrc/backend/core/services/tiered_storage_tasks.py(line 113) properly captures the returnedkey_idand updatesblob.encryption_key_idaccordingly. Dedup-specific tests also confirm this pattern works correctly when the same content is uploaded with different encryption keys.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
| def __init__(self): | ||
| """Initialize the service, checking if object storage is configured.""" | ||
| # Check if message-blobs storage has endpoint_url configured | ||
| self._storage = None | ||
| self.enabled = bool( | ||
| settings.STORAGES.get("message-blobs", {}) | ||
| .get("OPTIONS", {}) | ||
| .get("endpoint_url") | ||
| ) | ||
| # encryption_keys is a dict: {"1": "key1", "2": "key2"} | ||
| self.encryption_keys = settings.MESSAGES_BLOB_ENCRYPTION_KEYS or {} | ||
| self.active_key_id = settings.MESSAGES_BLOB_ENCRYPTION_ACTIVE_KEY_ID | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
rg -n "message-blobs" -C 5 src/backend/Repository: suitenumerique/messages
Length of output: 9576
🏁 Script executed:
find src/backend -name "settings.py" -o -name "settings*.py" | head -20Repository: suitenumerique/messages
Length of output: 99
🏁 Script executed:
rg -n "STORAGES\s*=" -C 3 src/backend/Repository: suitenumerique/messages
Length of output: 4058
🏁 Script executed:
rg -n "endpoint_url" -C 3 src/backend/Repository: suitenumerique/messages
Length of output: 5260
🏁 Script executed:
cat -n src/backend/core/services/tiered_storage.py | head -80Repository: suitenumerique/messages
Length of output: 3066
🏁 Script executed:
cat -n src/backend/core/management/commands/verify_tiered_storage.py | head -85Repository: suitenumerique/messages
Length of output: 3644
Fix enabled gate to check storage config presence, not endpoint_url.
Lines 35-39 gate on endpoint_url presence, which silently disables valid AWS S3 configurations (AWS S3 doesn't use endpoint_url; that field is optional for custom endpoints like MinIO). The condition should check for the presence of the message-blobs storage config itself.
Suggested fix
- self.enabled = bool(
- settings.STORAGES.get("message-blobs", {})
- .get("OPTIONS", {})
- .get("endpoint_url")
- )
+ storage_cfg = settings.STORAGES.get("message-blobs")
+ self.enabled = bool(storage_cfg)🤖 Prompt for AI Agents
In `@src/backend/core/services/tiered_storage.py` around lines 31 - 43, In
__init__, the enabled gate currently checks for an OPTIONS.endpoint_url which
wrongly disables valid S3 setups; instead set self.enabled based on presence of
the "message-blobs" storage config itself (e.g. check that settings.STORAGES
contains a non-empty "message-blobs" entry). Update the assignment to
self.enabled to use settings.STORAGES.get("message-blobs") (or "message-blobs"
in settings.STORAGES and truthy) rather than digging for OPTIONS.endpoint_url so
AWS S3 configs without endpoint_url remain enabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🤖 Fix all issues with AI agents
In `@src/backend/core/management/commands/verify_tiered_storage.py`:
- Around line 449-455: When re-encrypting Postgres blobs in
verify_tiered_storage.py (using self.service.decrypt and self.service.encrypt)
ensure you persist the recomputed size_compressed: after setting
blob.raw_content = encrypted and blob.encryption_key_id = new_key_id, call
blob.save(update_fields=["raw_content", "encryption_key_id", "size_compressed"])
so the recalculated size_compressed is written to the DB (match the approach
used in 0007_blob_size_compressed.py).
- Around line 280-282: Move the local imports of pyzstd and
CompressionTypeChoices out of the function and place them at the top of the
module alongside the existing BlobStorageLocationChoices import so they are
module-level imports; update the import section to include "import pyzstd" and
"from core.enums import CompressionTypeChoices" and remove the in-function
imports where pyzstd and CompressionTypeChoices are currently referenced in
verify_tiered_storage logic.
In `@src/backend/core/signals.py`:
- Around line 109-119: The post_delete signal handler cleanup_blob_storage
currently calls TieredStorageService().delete_if_orphaned() immediately, which
can run inside a transaction that may roll back; wrap the deletion call in
transaction.on_commit to defer it until after successful commit: import
django.db.transaction if needed, capture the instance.sha256 (or
bytes(instance.sha256)) and any required storage_location check inside
cleanup_blob_storage, create a small closure or lambda that calls
TieredStorageService().delete_if_orphaned(...) and register it with
transaction.on_commit, and ensure the existing guard checks
(BlobStorageLocationChoices.OBJECT_STORAGE and service.enabled) are preserved
before scheduling the on_commit callback.
In `@src/backend/core/tests/services/test_tiered_storage.py`:
- Line 114: Several intentional local test imports (e.g. "from
cryptography.fernet import InvalidToken" and the other test-only imports flagged
at the comment's listed locations) are meant to remain inside test functions;
add a trailing "# noqa: PLC0415" to each of those local import lines instead of
moving them to module level so Ruff/Pylint stops reporting
import-outside-toplevel while keeping the imports local to their tests. Ensure
you update each flagged import line (the ones referenced in the review) by
appending the exact comment "# noqa: PLC0415".
🧹 Nitpick comments (2)
src/backend/core/tests/services/test_tiered_storage.py (1)
265-563: Consider skipping E2E tests when object storage isn’t configured.The docstring says “when available,” but these tests hard‑fail if storage isn’t configured. A
skipifmakes local/dev runs more resilient.🔧 Suggested refactor
+# At module level +_STORAGE_ENABLED = TieredStorageService().enabled @@ -@pytest.mark.django_db -class TestTieredStorageE2E: +@pytest.mark.django_db +@pytest.mark.skipif( + not _STORAGE_ENABLED, reason="Object storage not configured" +) +class TestTieredStorageE2E:src/backend/core/tests/tasks/test_tiered_storage_tasks.py (1)
60-194: Consider skipping E2E task tests when object storage isn’t configured.These E2E tests will fail in environments without MinIO. A
skipifkeeps dev/test runs resilient while still exercising coverage in CI.🔧 Suggested refactor
+# At module level +_STORAGE_ENABLED = TieredStorageService().enabled @@ -@pytest.mark.django_db -class TestOffloadBlobsTaskE2E: +@pytest.mark.django_db +@pytest.mark.skipif( + not _STORAGE_ENABLED, reason="Object storage not configured" +) +class TestOffloadBlobsTaskE2E: @@ -@pytest.mark.django_db -class TestOffloadSingleBlobTaskE2E: +@pytest.mark.django_db +@pytest.mark.skipif( + not _STORAGE_ENABLED, reason="Object storage not configured" +) +class TestOffloadSingleBlobTaskE2E:Also applies to: 221-399
| import pyzstd | ||
|
|
||
| from core.enums import CompressionTypeChoices |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
cat -n src/backend/core/management/commands/verify_tiered_storage.py | sed -n '270,295p'Repository: suitenumerique/messages
Length of output: 1214
🏁 Script executed:
head -n 50 src/backend/core/management/commands/verify_tiered_storage.py | cat -nRepository: suitenumerique/messages
Length of output: 2125
🏁 Script executed:
rg "pyzstd|CompressionTypeChoices" src/backend/core/management/commands/verify_tiered_storage.py -nRepository: suitenumerique/messages
Length of output: 286
🏁 Script executed:
cat -n src/backend/core/management/commands/verify_tiered_storage.py | sed -n '250,310p'Repository: suitenumerique/messages
Length of output: 2598
🏁 Script executed:
rg "pyzstd" src/backend/core/management/commands/verify_tiered_storage.py -B 3 -A 3Repository: suitenumerique/messages
Length of output: 449
🏁 Script executed:
rg "from core.enums import" src/backend/core/management/commands/verify_tiered_storage.pyRepository: suitenumerique/messages
Length of output: 174
🏁 Script executed:
fd "requirements" src/backend/ | head -20Repository: suitenumerique/messages
Length of output: 49
🏁 Script executed:
find . -maxdepth 3 -type f \( -name "requirements*.txt" -o -name "setup.py" -o -name "pyproject.toml" -o -name "poetry.lock" -o -name "Pipfile" \) 2>/dev/null | head -20Repository: suitenumerique/messages
Length of output: 229
🏁 Script executed:
cat src/backend/pyproject.toml | grep -A 100 "dependencies\|pyzstd"Repository: suitenumerique/messages
Length of output: 3330
🏁 Script executed:
rg "CompressionTypeChoices" src/backend/core/management/commands/verify_tiered_storage.py -nRepository: suitenumerique/messages
Length of output: 196
🏁 Script executed:
cat -n src/backend/core/management/commands/verify_tiered_storage.py | sed -n '1,30p'Repository: suitenumerique/messages
Length of output: 1306
Move pyzstd and CompressionTypeChoices imports to module level.
Both imports are required dependencies and should follow PEP 8 conventions at the module level. CompressionTypeChoices already shares the same source as BlobStorageLocationChoices (which is imported at module level), and pyzstd is a direct project dependency. Keeping them local to the method is unnecessary and inconsistent with the codebase's import pattern.
🧰 Tools
🪛 Ruff (0.14.14)
280-280: import should be at the top-level of a file
(PLC0415)
282-282: import should be at the top-level of a file
(PLC0415)
🤖 Prompt for AI Agents
In `@src/backend/core/management/commands/verify_tiered_storage.py` around lines
280 - 282, Move the local imports of pyzstd and CompressionTypeChoices out of
the function and place them at the top of the module alongside the existing
BlobStorageLocationChoices import so they are module-level imports; update the
import section to include "import pyzstd" and "from core.enums import
CompressionTypeChoices" and remove the in-function imports where pyzstd and
CompressionTypeChoices are currently referenced in verify_tiered_storage logic.
| decrypted = self.service.decrypt(bytes(blob.raw_content), old_key_id) | ||
| encrypted, new_key_id = self.service.encrypt(decrypted) | ||
|
|
||
| blob.raw_content = encrypted | ||
| blob.encryption_key_id = new_key_id | ||
| blob.save(update_fields=["raw_content", "encryption_key_id"]) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
find . -name "*.py" -type f | xargs grep -l "class Blob" | head -20Repository: suitenumerique/messages
Length of output: 261
🏁 Script executed:
cat -n src/backend/core/models.py | grep -A 50 "class Blob"Repository: suitenumerique/messages
Length of output: 4181
🏁 Script executed:
grep -n "def save" src/backend/core/models.py | grep -A 20 "class Blob" | head -50Repository: suitenumerique/messages
Length of output: 49
🏁 Script executed:
sed -n '1501,1700p' src/backend/core/models.py | grep -n "def save"Repository: suitenumerique/messages
Length of output: 106
🏁 Script executed:
sed -n '1590,1650p' src/backend/core/models.pyRepository: suitenumerique/messages
Length of output: 2271
🏁 Script executed:
sed -n '440,465p' src/backend/core/management/commands/verify_tiered_storage.pyRepository: suitenumerique/messages
Length of output: 1022
🏁 Script executed:
rg "post_save.*Blob|receiver.*Blob" src/backend/core/ -A 5Repository: suitenumerique/messages
Length of output: 528
🏁 Script executed:
rg "blob\.save\(update_fields" src/backend/ -B 3 -A 1Repository: suitenumerique/messages
Length of output: 2036
🏁 Script executed:
rg "def encrypt|def decrypt" src/backend/core/services/ -A 10 | head -80Repository: suitenumerique/messages
Length of output: 1695
Include size_compressed in the update set when re-encrypting Postgres blobs.
When blob.raw_content is changed to newly encrypted content, Blob.save() recalculates size_compressed from the new content length. However, using update_fields without size_compressed prevents this recalculated value from being persisted to the database, leaving the field stale. This is demonstrated in migration 0007_blob_size_compressed.py, which explicitly includes size_compressed in update_fields when modifying raw content.
🔧 Suggested fix
- blob.save(update_fields=["raw_content", "encryption_key_id"])
+ blob.save(
+ update_fields=["raw_content", "encryption_key_id", "size_compressed"]
+ )🤖 Prompt for AI Agents
In `@src/backend/core/management/commands/verify_tiered_storage.py` around lines
449 - 455, When re-encrypting Postgres blobs in verify_tiered_storage.py (using
self.service.decrypt and self.service.encrypt) ensure you persist the recomputed
size_compressed: after setting blob.raw_content = encrypted and
blob.encryption_key_id = new_key_id, call
blob.save(update_fields=["raw_content", "encryption_key_id", "size_compressed"])
so the recalculated size_compressed is written to the DB (match the approach
used in 0007_blob_size_compressed.py).
| @receiver(post_delete, sender=models.Blob) | ||
| def cleanup_blob_storage(sender, instance, **kwargs): | ||
| """Clean up object storage when a blob is deleted. | ||
|
|
||
| Uses a post_delete signal instead of a model delete() override to ensure | ||
| cleanup also runs during CASCADE and QuerySet bulk deletes. | ||
| """ | ||
| if instance.storage_location == BlobStorageLocationChoices.OBJECT_STORAGE: | ||
| service = TieredStorageService() | ||
| if service.enabled: | ||
| service.delete_if_orphaned(bytes(instance.sha256)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
# First, let's look at the file structure and find the signals.py file
find src/backend -name "signals.py" -type fRepository: suitenumerique/messages
Length of output: 94
🏁 Script executed:
# Read the signals.py file to see the full context and current imports
cat -n src/backend/core/signals.py | head -130Repository: suitenumerique/messages
Length of output: 5232
🏁 Script executed:
# Check if transaction is already imported in the file
rg "from django.db import" src/backend/core/signals.pyRepository: suitenumerique/messages
Length of output: 49
🏁 Script executed:
# Check if transaction.on_commit is used elsewhere in the codebase
rg "transaction\.on_commit" src/backend --type pyRepository: suitenumerique/messages
Length of output: 49
🏁 Script executed:
# Check if there are other post_delete signal handlers using transaction patterns
rg "post_delete" src/backend --type py -A 10Repository: suitenumerique/messages
Length of output: 6793
🌐 Web query:
Django post_delete signal transaction context behavior
💡 Result:
- When
post_deletefires: Django sendspost_delete“at the end” of a model instance’sdelete()and a queryset’sdelete()—i.e., after the SQLDELETEfor that object has been executed (within Django’s deletion flow). The instance is no longer in the DB at that point. [1] - Transaction context:
post_deleteis sent inside the same DB transaction used for the deletion (including the implicitatomic()Django uses around collected deletes). This is intentional so any DB writes done by signal receivers participate in the same transaction and will be committed/rolled back together. If the surrounding transaction later rolls back, the signal has already run, but all DB changes (the delete and receiver writes) roll back too. [2] - If you need “only after commit” behavior: For non-DB side effects (send email, enqueue Celery task, invalidate cache, call external API), don’t do them directly in
post_delete. Register them withtransaction.on_commit(...)so they run only if the transaction actually commits; otherwise they’re discarded. [2][3] - Bulk / cascades:
QuerySet.delete()does not call each model’sdelete()method, but it does emitpre_delete/post_deletefor each deleted object, including cascaded deletions. [1][4]
Sources: [1] Django signals docs (post_delete) https://docs.djangoproject.com/en/2.0/ref/signals/ ; [2] Django ticket discussion explaining post_delete is inside the delete transaction and recommending on_commit https://code.djangoproject.com/ticket/30835 ; [3] Django transactions docs (transaction.on_commit) https://docs.djangoproject.com/en/3.2/topics/db/transactions/ ; [4] StackOverflow quoting Django docs on bulk delete and signals https://stackoverflow.com/questions/28896237/override-djangos-model-delete-method-for-bulk-deletion
Defer object-storage deletion until the DB commit.
post_delete runs inside the surrounding transaction. If it rolls back, the S3 object deletion has already completed, leaving an orphaned object while the DB row is restored. Use transaction.on_commit() to ensure cleanup only happens after a successful commit.
🔧 Suggested fix
-from django.db.models.signals import post_delete, post_save, pre_delete
+from django.db import transaction
+from django.db.models.signals import post_delete, post_save, pre_delete
@@
def cleanup_blob_storage(sender, instance, **kwargs):
@@
- if instance.storage_location == BlobStorageLocationChoices.OBJECT_STORAGE:
- service = TieredStorageService()
- if service.enabled:
- service.delete_if_orphaned(bytes(instance.sha256))
+ if instance.storage_location == BlobStorageLocationChoices.OBJECT_STORAGE:
+ service = TieredStorageService()
+ if service.enabled:
+ sha256 = bytes(instance.sha256)
+ transaction.on_commit(
+ lambda: service.delete_if_orphaned(sha256)
+ )📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| @receiver(post_delete, sender=models.Blob) | |
| def cleanup_blob_storage(sender, instance, **kwargs): | |
| """Clean up object storage when a blob is deleted. | |
| Uses a post_delete signal instead of a model delete() override to ensure | |
| cleanup also runs during CASCADE and QuerySet bulk deletes. | |
| """ | |
| if instance.storage_location == BlobStorageLocationChoices.OBJECT_STORAGE: | |
| service = TieredStorageService() | |
| if service.enabled: | |
| service.delete_if_orphaned(bytes(instance.sha256)) | |
| from django.db import transaction | |
| from django.db.models.signals import post_delete, post_save, pre_delete | |
| `@receiver`(post_delete, sender=models.Blob) | |
| def cleanup_blob_storage(sender, instance, **kwargs): | |
| """Clean up object storage when a blob is deleted. | |
| Uses a post_delete signal instead of a model delete() override to ensure | |
| cleanup also runs during CASCADE and QuerySet bulk deletes. | |
| """ | |
| if instance.storage_location == BlobStorageLocationChoices.OBJECT_STORAGE: | |
| service = TieredStorageService() | |
| if service.enabled: | |
| sha256 = bytes(instance.sha256) | |
| transaction.on_commit( | |
| lambda: service.delete_if_orphaned(sha256) | |
| ) |
🤖 Prompt for AI Agents
In `@src/backend/core/signals.py` around lines 109 - 119, The post_delete signal
handler cleanup_blob_storage currently calls
TieredStorageService().delete_if_orphaned() immediately, which can run inside a
transaction that may roll back; wrap the deletion call in transaction.on_commit
to defer it until after successful commit: import django.db.transaction if
needed, capture the instance.sha256 (or bytes(instance.sha256)) and any required
storage_location check inside cleanup_blob_storage, create a small closure or
lambda that calls TieredStorageService().delete_if_orphaned(...) and register it
with transaction.on_commit, and ensure the existing guard checks
(BlobStorageLocationChoices.OBJECT_STORAGE and service.enabled) are preserved
before scheduling the on_commit callback.
|
|
||
| def test_decrypt_with_corrupted_data(self): | ||
| """Test that decrypt fails gracefully with corrupted Fernet data.""" | ||
| from cryptography.fernet import InvalidToken |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
# First, check if the file exists and examine the flagged lines
cd /tmp && find . -name "test_tiered_storage.py" 2>/dev/null | head -5Repository: suitenumerique/messages
Length of output: 49
🏁 Script executed:
# Get the file stats and examine specific lines around the flagged imports
wc -l src/backend/core/tests/services/test_tiered_storage.py
head -130 src/backend/core/tests/services/test_tiered_storage.py | tail -30Repository: suitenumerique/messages
Length of output: 1290
🏁 Script executed:
# Examine all flagged lines with context
sed -n '110,120p' src/backend/core/tests/services/test_tiered_storage.py
sed -n '125,135p' src/backend/core/tests/services/test_tiered_storage.py
sed -n '267,277p' src/backend/core/tests/services/test_tiered_storage.py
sed -n '292,302p' src/backend/core/tests/services/test_tiered_storage.py
sed -n '358,368p' src/backend/core/tests/services/test_tiered_storage.py
sed -n '392,402p' src/backend/core/tests/services/test_tiered_storage.py
sed -n '592,602p' src/backend/core/tests/services/test_tiered_storage.py
sed -n '626,636p' src/backend/core/tests/services/test_tiered_storage.py
sed -n '669,679p' src/backend/core/tests/services/test_tiered_storage.py
sed -n '709,719p' src/backend/core/tests/services/test_tiered_storage.pyRepository: suitenumerique/messages
Length of output: 4381
🏁 Script executed:
# Check the full top-level imports at the beginning of the file
head -50 src/backend/core/tests/services/test_tiered_storage.pyRepository: suitenumerique/messages
Length of output: 1831
Add # noqa: PLC0415 markers to intentional local imports instead of moving them to module level.
The file already disables import-outside-toplevel via pylint, indicating these are intentional. In test files, local imports are a standard practice to keep test methods focused on their specific dependencies and avoid polluting the module namespace. Rather than moving these to the top-level, add # noqa: PLC0415 to each flagged import (lines 114, 129, 271, 296, 362, 396, 596, 630, 673, 713) to suppress the Ruff warning while preserving the intended structure.
🧰 Tools
🪛 Ruff (0.14.14)
114-114: import should be at the top-level of a file
(PLC0415)
🤖 Prompt for AI Agents
In `@src/backend/core/tests/services/test_tiered_storage.py` at line 114, Several
intentional local test imports (e.g. "from cryptography.fernet import
InvalidToken" and the other test-only imports flagged at the comment's listed
locations) are meant to remain inside test functions; add a trailing "# noqa:
PLC0415" to each of those local import lines instead of moving them to module
level so Ruff/Pylint stops reporting import-outside-toplevel while keeping the
imports local to their tests. Ensure you update each flagged import line (the
ones referenced in the review) by appending the exact comment "# noqa: PLC0415".
This allows to use S3-compatible object storage to offload blobs, making Postgres much lighter. We design for storing ~1B emails on a single instance.
Fixes #185.
Summary by CodeRabbit
New Features
Chores
✏️ Tip: You can customize this high-level summary in your review settings.