-
Notifications
You must be signed in to change notification settings - Fork 3.2k
[WEB-4720] chore: refactor and extend cleanup tasks for logs and versions #7604
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Consolidate API log deletion into cleanup_task.py - Add tasks to delete old email logs, page versions, and issue description versions - Update Celery schedule and imports for new tasks
WalkthroughReplaces a single legacy API log deletion task with a new batch cleanup pipeline that optionally archives records to MongoDB and performs batched deletions; updates Celery schedules/imports, adds a Mongo connection manager and logger, and adds pymongo to base requirements. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor Beat as Celery Beat
participant Worker as Celery Worker
participant Cleanup as cleanup_task.process_cleanup_task
participant Mongo as MongoConnection / MongoDB
participant DB as PostgreSQL (Django ORM)
Beat->>Worker: Schedule delete_* tasks (02:30–04:00 UTC)
Worker->>Cleanup: Start task (e.g., delete_api_logs)
Cleanup->>Mongo: get_collection(collection_name)
alt Mongo configured
Mongo-->>Cleanup: Collection (available)
else Not configured or error
Mongo-->>Cleanup: None
end
loop For each batch (size=1000)
Cleanup->>DB: Fetch next batch (iterator)
Cleanup->>Cleanup: Transform records, collect ids & docs
alt Mongo available
Cleanup->>Mongo: Bulk insert transformed docs
Mongo-->>Cleanup: Ack or BulkWriteError
end
Cleanup->>DB: Delete rows by collected ids
end
Cleanup-->>Worker: Log totals and finish
sequenceDiagram
autonumber
actor Beat as Celery Beat
participant API as delete_api_logs
participant Email as delete_email_notification_logs
participant Page as delete_page_versions
participant Issue as delete_issue_description_versions
Beat->>API: 02:30 UTC daily
Beat->>Email: 03:00 UTC daily
Beat->>Page: 03:30 UTC daily
Beat->>Issue: 04:00 UTC daily
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
Tip 🔌 Remote MCP (Model Context Protocol) integration is now available!Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats. 📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 💡 Knowledge Base configuration:
You can enable these sources in your CodeRabbit configuration. 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
|
Pull Request Linked with Plane Work Items Comment Automatically Generated by Plane |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (6)
apps/api/plane/celery.py (1)
56-59: Nit: rename schedule key for clarity.
check-every-day-to-delete-logsis ambiguous now that there are multiple log cleanups. Consider making it explicit.Apply:
- "check-every-day-to-delete-logs": { + "check-every-day-to-delete-email-notification-logs": {apps/api/plane/bgtasks/cleanup_task.py (5)
22-33: Return deleted count and fix misleading “30 days” comment; compute threshold once.Minor cleanup and better observability. Returning the count helps monitoring.
Apply:
@shared_task def delete_api_logs(): - # Get the logs older than 30 days to delete - logs_to_delete = APIActivityLog.all_objects.filter( - created_at__lte=( - timezone.now() - timedelta(days=settings.HARD_DELETE_AFTER_DAYS) - ) - ) - - # Delete the logs - logs_to_delete.delete() + # Delete logs older than HARD_DELETE_AFTER_DAYS + threshold = timezone.now() - timedelta(days=settings.HARD_DELETE_AFTER_DAYS) + logs_to_delete = APIActivityLog.all_objects.filter(created_at__lte=threshold) + deleted, _ = logs_to_delete.delete() + return deleted
35-44: Same as above: clarify comment and return deleted count.Keeps both log cleaners consistent and easier to observe.
Apply:
@shared_task def delete_email_notification_logs(): - # Get the logs older than 30 days to delete - logs_to_delete = EmailNotificationLog.all_objects.filter( - sent_at__lte=(timezone.now() - timedelta(days=settings.HARD_DELETE_AFTER_DAYS)) - ) - - # Delete the logs - logs_to_delete.delete() + # Delete email notification logs older than HARD_DELETE_AFTER_DAYS + threshold = timezone.now() - timedelta(days=settings.HARD_DELETE_AFTER_DAYS) + logs_to_delete = EmailNotificationLog.all_objects.filter(sent_at__lte=threshold) + deleted, _ = logs_to_delete.delete() + return deleted
13-19: Parameterize “20” via settings for easier tuning.Avoid magic numbers; makes rollout safer without code changes.
Apply:
from django.conf import settings # Third party imports from celery import shared_task # Module imports @@ from plane.db.models import ( EmailNotificationLog, PageVersion, APIActivityLog, IssueDescriptionVersion, ) +# Configuration +MAX_VERSIONS_TO_KEEP = getattr(settings, "VERSION_HISTORY_TO_KEEP", 20)
48-64: Make version limit configurable & exclude soft-deleted rows when counting latest versionsPageVersion uses soft-deletion (objects = SoftDeletionManager(), all_objects = models.Manager()), so annotating all_objects will let deleted rows consume slots. Instead:
• Define a constant (e.g. MAX_VERSIONS_TO_KEEP) or pull from settings
• Annotate only active rows via PageVersion.objects
• Still delete via all_objectsApply:
@shared_task def delete_page_versions(): - # Here we have to keep maximum 20 versions of a page delete all versions that are greater than 20 for an issue + # Keep at most MAX_VERSIONS_TO_KEEP versions per page; delete older ones beyond the limit subq = ( - PageVersion.all_objects.annotate( + PageVersion.objects.annotate( row_num=Window( expression=RowNumber(), partition_by=[F("page_id")], order_by=F("created_at").desc(), ) ) .filter( - row_num__gt=20, + row_num__gt=MAX_VERSIONS_TO_KEEP, ) .values("id") ) PageVersion.all_objects.filter(id__in=Subquery(subq)).delete()
66-83: Make keep limit configurable and exclude soft-deleted versions
Use the.objectsmanager (which filters out soft-deleted rows) instead ofall_objects, replace the magic number20with a configurable constant, and—if you ever need to include soft-deleted rows—allow toggling back toall_objects.File:
apps/api/plane/bgtasks/cleanup_task.py(around lines 66–83)Apply for example:
@shared_task def delete_issue_description_versions(ignore_soft_deleted: bool = True): - # Keep at most 20 description versions per issue; delete older ones - subq = ( - IssueDescriptionVersion.all_objects.annotate( + # Keep at most MAX_VERSIONS_TO_KEEP description versions per issue; delete older ones + # By default excludes soft-deleted rows; set ignore_soft_deleted=False to include them. + manager = ( + IssueDescriptionVersion.objects + if ignore_soft_deleted + else IssueDescriptionVersion.all_objects + ) + subq = ( + manager.annotate( row_num=Window( expression=RowNumber(), partition_by=[F("issue_id")], order_by=F("created_at").desc(), ) ) .filter( - row_num__gt=MAX_VERSIONS_TO_KEEP, + row_num__gt=MAX_VERSIONS_TO_KEEP, ) .values("id") ) - IssueDescriptionVersion.all_objects.filter(id__in=Subquery(subq)).delete() + # Always hard-delete by ID, regardless of soft-delete status + IssueDescriptionVersion.all_objects.filter(id__in=Subquery(subq)).delete()• Define
MAX_VERSIONS_TO_KEEP(e.g. insettings.py)
• Import it at top of this file:from django.conf import settings MAX_VERSIONS_TO_KEEP = settings.MAX_ISSUE_DESCRIPTION_VERSIONS• Optionally adjust the task signature to accept a flag when you need to include soft-deleted rows.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (4)
apps/api/plane/bgtasks/api_logs_task.py(0 hunks)apps/api/plane/bgtasks/cleanup_task.py(1 hunks)apps/api/plane/celery.py(1 hunks)apps/api/plane/settings/common.py(1 hunks)
💤 Files with no reviewable changes (1)
- apps/api/plane/bgtasks/api_logs_task.py
🧰 Additional context used
🧬 Code Graph Analysis (1)
apps/api/plane/bgtasks/cleanup_task.py (1)
apps/api/plane/db/models/issue.py (1)
IssueDescriptionVersion(795-837)
🔇 Additional comments (2)
apps/api/plane/settings/common.py (1)
287-287: Verified CELERY_IMPORTS update; no stale references detected.
- apps/api/plane/settings/common.py (line 287) now imports
"plane.bgtasks.cleanup_task"rg -nP 'plane\.bgtasks\.api_logs_task|api_logs_task\.delete_api_logs'returned no matchesEverything looks good—approving these changes.
apps/api/plane/celery.py (1)
53-55: Good: beat task now points tocleanup_task.delete_api_logs.The schedule keeps the same time slot; only the task path changed. No issues spotted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
♻️ Duplicate comments (2)
apps/api/plane/bgtasks/cleanup_task.py (2)
284-317: Performance: add composite index to support windowed prune (PageVersion).The window partition/order on (page_id, created_at DESC) benefits from a composite index. Without it, daily scans can be expensive.
- models.Index(fields=["page", "-created_at"], name="page_created_at_idx")
This was flagged earlier; re-iterating here for this code path.
I can generate the migration for you.
320-353: Performance: add composite index to support windowed prune (IssueDescriptionVersion).Add:
- models.Index(fields=["issue", "-created_at"], name="issue_created_at_idx")
This aligns with the delete query pattern.
Happy to generate the migration.
🧹 Nitpick comments (5)
apps/api/plane/settings/mongo.py (1)
31-44: Optional: add a simple init lock to make the singleton thread-safe.Concurrent new calls could race in threaded contexts. A lightweight module-level Lock prevents double init.
Example:
+import threading @@ - _instance: Optional["MongoConnection"] = None + _instance: Optional["MongoConnection"] = None _client: Optional[MongoClient] = None _db: Optional[Database] = None + _lock = threading.Lock() @@ - if cls._instance is None: - cls._instance = super(MongoConnection, cls).__new__(cls) + if cls._instance is None: + with cls._lock: + if cls._instance is None: + cls._instance = super(MongoConnection, cls).__new__(cls) + # init continues...apps/api/plane/bgtasks/cleanup_task.py (4)
33-47: Tone down noisy "MongoDB not configured" logs and guard success message.
- The task will log "MongoDB not configured" at INFO every run; this is noisy when Mongo is intentionally disabled. Prefer DEBUG.
- After get_collection(...), log success only if a collection is actually returned.
- if not MongoConnection.is_configured(): - logger.info("MongoDB not configured") + if not MongoConnection.is_configured(): + logger.debug("MongoDB not configured") return None @@ - logger.info(f"MongoDB collection '{collection_name}' connected successfully") - return mongo_collection + if mongo_collection is not None: + logger.info(f"MongoDB collection '{collection_name}' connected successfully") + return mongo_collection
82-142: Type hints for IDs should allow UUIDs.Many Plane models use UUID primary keys. ids_to_delete: List[int] is misleading and could confuse future readers or static analysis.
- ids_to_delete: List[int] = [] + from uuid import UUID + ids_to_delete: List[UUID | int | str] = []Also adjust the function signature of flush_to_mongo_and_delete accordingly if you want to be precise.
145-162: Potential PII in headers/body; confirm redaction policy before archiving.APIActivityLog may include sensitive data in headers (e.g., Authorization), body, or response_body. Before inserting into Mongo, consider redacting/sealing sensitive fields, or dropping them entirely for long-term storage.
If desired, I can add a redact_api_log(record) step that strips tokens and known secrets.
185-205: Binary fields in Mongo: ensure compatibility and size limits.description_binary (bytes) will be stored as BSON Binary. This is fine, but:
- Large binaries can bloat the collection; consider dropping or compressing if archival is optional.
- If consumers expect text-only payloads, consider omitting binary.
If needed, I can add a toggle to exclude binary fields from archival.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (5)
apps/api/plane/bgtasks/cleanup_task.py(1 hunks)apps/api/plane/settings/local.py(1 hunks)apps/api/plane/settings/mongo.py(1 hunks)apps/api/plane/settings/production.py(1 hunks)apps/api/requirements/base.txt(1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
apps/api/plane/bgtasks/cleanup_task.py (3)
apps/api/plane/db/models/issue.py (1)
IssueDescriptionVersion(795-837)apps/api/plane/settings/mongo.py (3)
MongoConnection(18-118)is_configured(111-118)get_collection(88-108)apps/api/plane/utils/exception_logger.py (1)
log_exception(9-20)
🪛 GitHub Actions: Build and lint API
apps/api/requirements/base.txt
[error] 8-8: Could not find a version that satisfies the requirement psycopg-binary==3.1.18 (available: 3.2.2, 3.2.3, 3.2.4, 3.2.5, 3.2.6, 3.2.7, 3.2.8, 3.2.9) during 'cd apps/api && pip install -r requirements.txt'.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Analyze (python)
- GitHub Check: Analyze (javascript)
🔇 Additional comments (2)
apps/api/plane/settings/production.py (1)
86-90: LGTM: Adds dedicated plane.mongo logger for production.Consistent with local settings and the new Mongo integration; routing to console with INFO level is appropriate.
apps/api/plane/settings/local.py (1)
76-80: LGTM: Adds plane.mongo logger for local dev.Matches production logger shape; good for observing Mongo lifecycle locally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (2)
apps/api/plane/celery.py (2)
52-55: Critical: Update the DB PeriodicTask to point to the new API logs task path or the job will break under DatabaseSchedulerBecause app.conf.beat_scheduler is set to django_celery_beat.schedulers.DatabaseScheduler (see Line 95), the DB’s PeriodicTask entry still referencing plane.bgtasks.api_logs_task.delete_api_logs will not be auto-updated. If the old module has been removed, the beat-run will fail with a NotRegistered/ImportError at runtime. Upsert/patch the PeriodicTask row to use plane.bgtasks.cleanup_task.delete_api_logs.
Run this to confirm there’s a migration/startup sync that updates the PeriodicTask for API logs to the new path:
#!/bin/bash # Show that DatabaseScheduler is active rg -n -C1 -P 'beat_scheduler\s*=\s*"django_celery_beat\.schedulers\.DatabaseScheduler"' # Look for any code/migrations that create or update PeriodicTasks for delete_api_logs rg -n -C3 -P '\bPeriodicTask\b.*(update_or_create|get_or_create|filter|create)\(' --type=py | rg -n -C2 -P 'delete_api_logs|cleanup_task\.delete_api_logs' # Also check references by name rg -n -C2 -P 'check-every-day-to-delete-api-logs' --type=pyApply a data migration like this to safely update the task path (adjust dependency as needed):
from django.db import migrations def update_api_logs_task_path(apps, schema_editor): PeriodicTask = apps.get_model("django_celery_beat", "PeriodicTask") # Update by name (preferred, stable) PeriodicTask.objects.filter( name="check-every-day-to-delete-api-logs" ).update(task="plane.bgtasks.cleanup_task.delete_api_logs") # Fallback: update any lingering old-path rows PeriodicTask.objects.filter( task="plane.bgtasks.api_logs_task.delete_api_logs" ).update(task="plane.bgtasks.cleanup_task.delete_api_logs") class Migration(migrations.Migration): dependencies = [ # add dependency on latest django_celery_beat migration and your app's latest migration ] operations = [migrations.RunPython(update_api_logs_task_path, migrations.RunPython.noop)]
56-59: Critical: With DatabaseScheduler enabled, these beat_schedule entries won’t run unless mirrored in django-celery-beat’s DBSince Line 95 configures DatabaseScheduler, Celery Beat reads schedules from the DB, not from app.conf.beat_schedule. You need a data migration or startup sync to upsert CrontabSchedule and PeriodicTask rows for the three new tasks (and ensure the API logs task is updated to the new path).
Use this script to verify whether such a migration/upsert exists:
#!/bin/bash # Verify the scheduler mode rg -n -C1 -P 'beat_scheduler\s*=\s*"django_celery_beat\.schedulers\.DatabaseScheduler"' # Search for code/migrations that upsert PeriodicTask/CrontabSchedule for the new tasks rg -n -C3 -P '(CrontabSchedule|PeriodicTask).*delete_(email_notification_logs|page_versions|issue_description_versions)' --type=py # Broad search for these task names/paths anywhere rg -n -C2 -P 'delete_(email_notification_logs|page_versions|issue_description_versions)|cleanup_task\.(delete_email_notification_logs|delete_page_versions|delete_issue_description_versions)' --type=pyExample migration to upsert all four tasks:
from django.db import migrations def upsert_cleanup_tasks(apps, schema_editor): CrontabSchedule = apps.get_model("django_celery_beat", "CrontabSchedule") PeriodicTask = apps.get_model("django_celery_beat", "PeriodicTask") def ensure_task(name, task, hour, minute): cs, _ = CrontabSchedule.objects.get_or_create( minute=str(minute), hour=str(hour), day_of_week="*", day_of_month="*", month_of_year="*", ) PeriodicTask.objects.update_or_create( name=name, defaults={ "task": task, "crontab": cs, "enabled": True, "one_off": False, }, ) ensure_task( "check-every-day-to-delete-api-logs", "plane.bgtasks.cleanup_task.delete_api_logs", 2, 30, ) ensure_task( "check-every-day-to-delete-email-notification-logs", "plane.bgtasks.cleanup_task.delete_email_notification_logs", 3, 0, ) ensure_task( "check-every-day-to-delete-page-versions", "plane.bgtasks.cleanup_task.delete_page_versions", 3, 30, ) ensure_task( "check-every-day-to-delete-issue-description-versions", "plane.bgtasks.cleanup_task.delete_issue_description_versions", 4, 0, ) class Migration(migrations.Migration): dependencies = [ # add dependency on the latest django_celery_beat migration and your app's latest migration ] operations = [migrations.RunPython(upsert_cleanup_tasks, migrations.RunPython.noop)]Also applies to: 60-63, 64-67
🧹 Nitpick comments (1)
apps/api/plane/celery.py (1)
25-68: Add an inline note so future readers don’t assume this dict drives Beat in prodGiven DatabaseScheduler is enforced (Line 95), this beat_schedule dict is reference-only. A brief comment helps prevent drift and confusion.
Apply this diff to add a clarifying comment:
app.conf.beat_schedule = { - # Intra day recurring jobs + # NOTE: Celery Beat uses django_celery_beat's DatabaseScheduler (see line 95). + # The entries below are documentation/reference only. Ensure matching PeriodicTask rows exist in the DB. + # Intra day recurring jobs
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
apps/api/plane/celery.py(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Analyze (javascript)
🔇 Additional comments (1)
apps/api/plane/celery.py (1)
56-59: Naming looks good (includes “notification” as requested)The schedule key/task label is explicit and consistent with prior feedback. No changes needed.
- Added a parameter to check MongoDB availability in the flush_to_mongo_and_delete function. - Implemented error logging for MongoDB archival failures. - Updated calls to flush_to_mongo_and_delete to include the new parameter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (2)
apps/api/plane/bgtasks/cleanup_task.py (2)
306-337: Create composite DB indexes to support the windowed deletes.These queries will scan heavily without
(page_id, created_at DESC)and(issue_id, created_at DESC)indexes. This was previously flagged; still applies here.I can open a migration to add:
- PageVersion: Index(fields=["page", "-created_at"], name="page_created_at_idx")
- IssueDescriptionVersion: Index(fields=["issue", "-created_at"], name="issue_created_at_idx")
Verification script to check if such indexes already exist:
#!/bin/bash # Search model and migration files for the composite indexes rg -nP 'class\s+PageVersion\b|class\s+IssueDescriptionVersion\b' -C3 apps rg -nP 'models\.Index\(|index_together|indexes\s*=' -n apps rg -nP "page_created_at_idx|issue_created_at_idx" -n appsIf you want, I’ll generate the migration file as well.
Also applies to: 342-373
14-16: Prevent archival data loss; harden Mongo writes and gate deletion.
- Use unordered bulk writes for better throughput and resilience.
- Catch broader Mongo exceptions to avoid accidental deletion after a failed archival attempt.
- Gate the “skip deletion on archival failure” behavior behind
MONGO_ARCHIVE_REQUIREDto make it configurable (defaults to proceed with deletion if archival fails). This aligns with prior feedback.Apply these diffs:
-from pymongo.errors import BulkWriteError +from pymongo.errors import BulkWriteError, PyMongoError- mongo_archival_failed = False + mongo_archival_failed = False # Try to insert into MongoDB if available if mongo_collection and mongo_available: try: - mongo_collection.bulk_write([InsertOne(doc) for doc in buffer]) - except BulkWriteError as bwe: + mongo_collection.bulk_write( + [InsertOne(doc) for doc in buffer], + ordered=False, + ) + except (BulkWriteError, PyMongoError) as bwe: logger.error(f"MongoDB bulk write error: {str(bwe)}") log_exception(bwe) mongo_archival_failed = True + except Exception as e: + logger.error(f"Unexpected MongoDB error: {str(e)}") + log_exception(e) + mongo_archival_failed = True - # If MongoDB is available and archival failed, log the error and return - if mongo_available and mongo_archival_failed: - logger.error(f"MongoDB archival failed for {len(buffer)} records") - return + # If archival is required and failed, skip deletion for this batch + archive_required = os.environ.get("MONGO_ARCHIVE_REQUIRED", "false").lower() in ("1", "true", "yes") + if mongo_available and mongo_archival_failed and archive_required: + logger.warning( + f"Skipping deletion of {len(ids_to_delete)} records because Mongo archival failed and MONGO_ARCHIVE_REQUIRED=true" + ) + returnAlso applies to: 67-82
🧹 Nitpick comments (6)
apps/api/plane/bgtasks/cleanup_task.py (6)
30-30: Make batch size configurable via env.Allow tuning without code changes; large documents (e.g., HTML/JSON) may benefit from smaller batches.
-BATCH_SIZE = 1000 +BATCH_SIZE = int(os.environ.get("CLEANUP_BATCH_SIZE", "1000"))
309-314: Add a deterministic tie-breaker to window ordering.If multiple rows share the same
created_at, deletions can be non-deterministic across runs. Addid DESCas a secondary order key.- order_by=F("created_at").desc(), + order_by=[F("created_at").desc(), F("id").desc()],- order_by=F("created_at").desc(), + order_by=[F("created_at").desc(), F("id").desc()],Also applies to: 345-350
166-183: Redact secrets and PII in logs before archiving.
headers,query_params, andbodycan contain tokens, cookies, emails, etc. Add sanitization helpers and invoke them in transforms to minimize exposure.Apply this diff to the API log transform:
- "query_params": record.get("query_params"), - "headers": record.get("headers"), - "body": record.get("body"), + "query_params": sanitize_query_params(record.get("query_params")), + "headers": sanitize_headers(record.get("headers")), + "body": sanitize_body(record.get("body")),Optionally also sanitize email log payload:
- "data": record["data"], + "data": sanitize_email_data(record["data"]),Add these helpers (place anywhere above the transform functions):
REDACT_KEYS = { "authorization", "cookie", "set-cookie", "x-api-key", "x-api-token", "x-auth-token", "x-authorization", "proxy-authorization", "api_key", "access_token", "refresh_token", "jwt", "bearer" } def _redact_value(v: Any) -> Any: if v is None: return None s = str(v) if len(s) <= 8: return "****" return s[:4] + "****" + s[-4:] def sanitize_headers(headers: Any) -> Any: if not isinstance(headers, dict): return headers return { k: ("[REDACTED]" if k.lower() in REDACT_KEYS else v) for k, v in headers.items() } def sanitize_query_params(qp: Any) -> Any: if isinstance(qp, dict): return { k: ("[REDACTED]" if k.lower() in REDACT_KEYS else v) for k, v in qp.items() } return qp def sanitize_body(body: Any) -> Any: # Best-effort: redact obvious tokens in flat JSON dicts; leave others as-is if isinstance(body, dict): return { k: ("[REDACTED]" if k.lower() in REDACT_KEYS else v) for k, v in body.items() } return body def sanitize_email_data(data: Any) -> Any: # Optionally trim oversized payloads or remove sensitive fields return dataOptionally, add env flags to drop large bodies entirely (e.g.,
ARCHIVE_API_BODIES=false).Also applies to: 185-204
169-171: Prefer native datetimes over strings for Mongo.Avoid
str()around datetimes; PyMongo will store timezone-awaredatetimeas BSONDate, enabling TTL/indexed queries later.Example for one field (apply across all transforms):
- "created_at": str(record["created_at"]) if record.get("created_at") else None, + "created_at": record.get("created_at"),Also applies to: 189-191, 210-211, 232-233
9-9: Include unsent email logs in cutoff via fallback to created_at.Using only
sent_atexcludes failed/unsent logs. Consider deleting when eithersent_atorcreated_atis older than cutoff.Apply:
-from django.db.models import F, Window, Subquery +from django.db.models import F, Window, Subquery, Q- EmailNotificationLog.all_objects.filter(sent_at__lte=cutoff_time) + EmailNotificationLog.all_objects.filter( + Q(sent_at__lte=cutoff_time) | Q(created_at__lte=cutoff_time) + )If current behavior is intentional (only sent emails), ignore this.
Also applies to: 285-286
33-46: Reduce log verbosity for Mongo collection “connected successfully”.At INFO level this fires on each task run; consider DEBUG to keep worker logs cleaner.
- logger.info(f"MongoDB collection '{collection_name}' connected successfully") + logger.debug(f"MongoDB collection '{collection_name}' connected successfully")
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
apps/api/plane/bgtasks/cleanup_task.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
apps/api/plane/bgtasks/cleanup_task.py (3)
apps/api/plane/db/models/issue.py (1)
IssueDescriptionVersion(795-837)apps/api/plane/settings/mongo.py (3)
MongoConnection(18-118)is_configured(111-118)get_collection(88-108)apps/api/plane/utils/exception_logger.py (1)
log_exception(9-20)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Analyze (javascript)
- Updated the parameter name from 'mode' to 'model' in the process_cleanup_task function to ensure consistency and clarity in the code.
… class - Replaced direct access to settings with getattr for MONGO_DB_URL and MONGO_DB_DATABASE to enhance robustness. - Added warning logging for missing MongoDB connection parameters.
Description
Type of Change
Test Scenarios
References
WEB-4720
Summary by CodeRabbit
New Features
Chores