[WEB-4720] chore: refactor and extend cleanup tasks for logs and versions #7604

pablohashescobar · 2025-08-20T07:28:05Z

Description

Consolidate API log deletion into cleanup_task.py
Add tasks to delete old email logs, page versions, and issue description versions
Update Celery schedule and imports for new tasks

Type of Change

Improvement (change that would cause existing functionality to not work as expected)
Code refactoring

Test Scenarios

References

WEB-4720

Summary by CodeRabbit

New Features
- Daily automated cleanup tasks for API and email logs.
- Pruning of older page and issue description versions to keep recent history.
- Optional archival to MongoDB (when configured) before deletion.
Chores
- Consolidated cleanup workflow and updated overnight schedules.
- Added MongoDB client dependency and dedicated logging for Mongo operations.

- Consolidate API log deletion into cleanup_task.py - Add tasks to delete old email logs, page versions, and issue description versions - Update Celery schedule and imports for new tasks

coderabbitai · 2025-08-20T07:28:12Z

Walkthrough

Replaces a single legacy API log deletion task with a new batch cleanup pipeline that optionally archives records to MongoDB and performs batched deletions; updates Celery schedules/imports, adds a Mongo connection manager and logger, and adds pymongo to base requirements.

Changes

Cohort / File(s)	Summary
Removed legacy task `apps/api/plane/bgtasks/api_logs_task.py`	Deleted old Celery task `delete_api_logs` that hard-deleted APIActivityLog records older than ~30 days.
New unified cleanup pipeline & tasks `apps/api/plane/bgtasks/cleanup_task.py`	Added batch-oriented Celery tasks and orchestration: `delete_api_logs`, `delete_email_notification_logs`, `delete_page_versions`, `delete_issue_description_versions`. Implements batching, transform functions, queryset builders, optional Mongo archiving, and per-batch deletes.
Celery schedule updates `apps/api/plane/celery.py`	Replaced scheduled task reference to `plane.bgtasks.cleanup_task.delete_api_logs` and added daily jobs for email logs, page versions, and issue description versions at new times.
Celery imports update `apps/api/plane/settings/common.py`	Replaced `plane.bgtasks.api_logs_task` with `plane.bgtasks.cleanup_task` in `CELERY_IMPORTS`.
Mongo connection manager `apps/api/plane/settings/mongo.py`	New `MongoConnection` singleton providing `get_client`, `get_db`, `get_collection`, `is_configured`, plus module logger `plane.mongo`; lazy init and graceful degradation when Mongo is absent or failing.
Logging config: plane.mongo `apps/api/plane/settings/local.py`, `apps/api/plane/settings/production.py`	Added `plane.mongo` logger (level INFO, console handler, propagate=False) to local and production logging configs.
Dependencies `apps/api/requirements/base.txt`	Added `pymongo==4.6.3` and an inline `# mongo` comment.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor Beat as Celery Beat
  participant Worker as Celery Worker
  participant Cleanup as cleanup_task.process_cleanup_task
  participant Mongo as MongoConnection / MongoDB
  participant DB as PostgreSQL (Django ORM)

  Beat->>Worker: Schedule delete_* tasks (02:30–04:00 UTC)
  Worker->>Cleanup: Start task (e.g., delete_api_logs)
  Cleanup->>Mongo: get_collection(collection_name)
  alt Mongo configured
    Mongo-->>Cleanup: Collection (available)
  else Not configured or error
    Mongo-->>Cleanup: None
  end

  loop For each batch (size=1000)
    Cleanup->>DB: Fetch next batch (iterator)
    Cleanup->>Cleanup: Transform records, collect ids & docs
    alt Mongo available
      Cleanup->>Mongo: Bulk insert transformed docs
      Mongo-->>Cleanup: Ack or BulkWriteError
    end
    Cleanup->>DB: Delete rows by collected ids
  end
  Cleanup-->>Worker: Log totals and finish

sequenceDiagram
  autonumber
  actor Beat as Celery Beat
  participant API as delete_api_logs
  participant Email as delete_email_notification_logs
  participant Page as delete_page_versions
  participant Issue as delete_issue_description_versions

  Beat->>API: 02:30 UTC daily
  Beat->>Email: 03:00 UTC daily
  Beat->>Page: 03:30 UTC daily
  Beat->>Issue: 04:00 UTC daily

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

A rabbit with a tidy plan,
Hops through logs in batch and span.
To Mongo burrows bits to keep,
Then trims the rows so DB sleeps deep.
Four chimes at night — archive and sweep. 🐇✨

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 360122c and 6d7f358.

📒 Files selected for processing (1)

apps/api/plane/settings/mongo.py (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

apps/api/plane/settings/mongo.py

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Analyze (javascript)

✨ Finishing Touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch cleanup-tasks

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

Visit our Status Page to check the current availability of CodeRabbit.
Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

makeplane · 2025-08-20T07:28:20Z

Pull Request Linked with Plane Work Items

[WEB-4720] create cron jobs for removing stale junk data from self hosted instances

Comment Automatically Generated by Plane

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (6)

apps/api/plane/celery.py (1)

56-59: Nit: rename schedule key for clarity.

check-every-day-to-delete-logs is ambiguous now that there are multiple log cleanups. Consider making it explicit.

Apply:
-    "check-every-day-to-delete-logs": {
+    "check-every-day-to-delete-email-notification-logs": {

apps/api/plane/bgtasks/cleanup_task.py (5)

22-33: Return deleted count and fix misleading “30 days” comment; compute threshold once.

Minor cleanup and better observability. Returning the count helps monitoring.

Apply:

 @shared_task
 def delete_api_logs():
-    # Get the logs older than 30 days to delete
-    logs_to_delete = APIActivityLog.all_objects.filter(
-        created_at__lte=(
-            timezone.now() - timedelta(days=settings.HARD_DELETE_AFTER_DAYS)
-        )
-    )
-
-    # Delete the logs
-    logs_to_delete.delete()
+    # Delete logs older than HARD_DELETE_AFTER_DAYS
+    threshold = timezone.now() - timedelta(days=settings.HARD_DELETE_AFTER_DAYS)
+    logs_to_delete = APIActivityLog.all_objects.filter(created_at__lte=threshold)
+    deleted, _ = logs_to_delete.delete()
+    return deleted

35-44: Same as above: clarify comment and return deleted count.

Keeps both log cleaners consistent and easier to observe.

Apply:

 @shared_task
 def delete_email_notification_logs():
-    # Get the logs older than 30 days to delete
-    logs_to_delete = EmailNotificationLog.all_objects.filter(
-        sent_at__lte=(timezone.now() - timedelta(days=settings.HARD_DELETE_AFTER_DAYS))
-    )
-
-    # Delete the logs
-    logs_to_delete.delete()
+    # Delete email notification logs older than HARD_DELETE_AFTER_DAYS
+    threshold = timezone.now() - timedelta(days=settings.HARD_DELETE_AFTER_DAYS)
+    logs_to_delete = EmailNotificationLog.all_objects.filter(sent_at__lte=threshold)
+    deleted, _ = logs_to_delete.delete()
+    return deleted

13-19: Parameterize “20” via settings for easier tuning.

Avoid magic numbers; makes rollout safer without code changes.

Apply:

 from django.conf import settings
 
 # Third party imports
 from celery import shared_task
 
 # Module imports
@@
 from plane.db.models import (
     EmailNotificationLog,
     PageVersion,
     APIActivityLog,
     IssueDescriptionVersion,
 )
 
+# Configuration
+MAX_VERSIONS_TO_KEEP = getattr(settings, "VERSION_HISTORY_TO_KEEP", 20)

48-64: Make version limit configurable & exclude soft-deleted rows when counting latest versions

PageVersion uses soft-deletion (objects = SoftDeletionManager(), all_objects = models.Manager()), so annotating all_objects will let deleted rows consume slots. Instead:

• Define a constant (e.g. MAX_VERSIONS_TO_KEEP) or pull from settings
• Annotate only active rows via PageVersion.objects
• Still delete via all_objects

Apply:

 @shared_task
 def delete_page_versions():
-    # Here we have to keep maximum 20 versions of a page delete all versions that are greater than 20 for an issue
+    # Keep at most MAX_VERSIONS_TO_KEEP versions per page; delete older ones beyond the limit
     subq = (
-        PageVersion.all_objects.annotate(
+        PageVersion.objects.annotate(
             row_num=Window(
                 expression=RowNumber(),
                 partition_by=[F("page_id")],
                 order_by=F("created_at").desc(),
             )
         )
         .filter(
-            row_num__gt=20,
+            row_num__gt=MAX_VERSIONS_TO_KEEP,
         )
         .values("id")
     )
 
     PageVersion.all_objects.filter(id__in=Subquery(subq)).delete()

66-83: Make keep limit configurable and exclude soft-deleted versions
Use the .objects manager (which filters out soft-deleted rows) instead of all_objects, replace the magic number 20 with a configurable constant, and—if you ever need to include soft-deleted rows—allow toggling back to all_objects.

File: apps/api/plane/bgtasks/cleanup_task.py (around lines 66–83)

Apply for example:

 @shared_task
 def delete_issue_description_versions(ignore_soft_deleted: bool = True):
-    # Keep at most 20 description versions per issue; delete older ones
-    subq = (
-        IssueDescriptionVersion.all_objects.annotate(
+    # Keep at most MAX_VERSIONS_TO_KEEP description versions per issue; delete older ones
+    # By default excludes soft-deleted rows; set ignore_soft_deleted=False to include them.
+    manager = (
+        IssueDescriptionVersion.objects
+        if ignore_soft_deleted
+        else IssueDescriptionVersion.all_objects
+    )
+    subq = (
+        manager.annotate(
             row_num=Window(
                 expression=RowNumber(),
                 partition_by=[F("issue_id")],
                 order_by=F("created_at").desc(),
             )
         )
         .filter(
-            row_num__gt=MAX_VERSIONS_TO_KEEP,
+            row_num__gt=MAX_VERSIONS_TO_KEEP,
         )
         .values("id")
     )
 
-    IssueDescriptionVersion.all_objects.filter(id__in=Subquery(subq)).delete()
+    # Always hard-delete by ID, regardless of soft-delete status
+    IssueDescriptionVersion.all_objects.filter(id__in=Subquery(subq)).delete()

• Define MAX_VERSIONS_TO_KEEP (e.g. in settings.py)
• Import it at top of this file:

from django.conf import settings
MAX_VERSIONS_TO_KEEP = settings.MAX_ISSUE_DESCRIPTION_VERSIONS

• Optionally adjust the task signature to accept a flag when you need to include soft-deleted rows.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 6636b88 and 5fd447c.

📒 Files selected for processing (4)

apps/api/plane/bgtasks/api_logs_task.py (0 hunks)
apps/api/plane/bgtasks/cleanup_task.py (1 hunks)
apps/api/plane/celery.py (1 hunks)
apps/api/plane/settings/common.py (1 hunks)

💤 Files with no reviewable changes (1)

apps/api/plane/bgtasks/api_logs_task.py

🧰 Additional context used

🧬 Code Graph Analysis (1)

apps/api/plane/bgtasks/cleanup_task.py (1)

apps/api/plane/db/models/issue.py (1)

IssueDescriptionVersion (795-837)

🔇 Additional comments (2)

apps/api/plane/settings/common.py (1)

287-287: Verified CELERY_IMPORTS update; no stale references detected.

apps/api/plane/settings/common.py (line 287) now imports "plane.bgtasks.cleanup_task"

rg -nP 'plane\.bgtasks\.api_logs_task|api_logs_task\.delete_api_logs' returned no matches

Everything looks good—approving these changes.

apps/api/plane/celery.py (1)

53-55: Good: beat task now points to cleanup_task.delete_api_logs.

The schedule keeps the same time slot; only the task path changed. No issues spotted.

apps/api/plane/bgtasks/cleanup_task.py

apps/api/plane/celery.py

coderabbitai

Actionable comments posted: 3

♻️ Duplicate comments (2)

apps/api/plane/bgtasks/cleanup_task.py (2)

284-317: Performance: add composite index to support windowed prune (PageVersion).

The window partition/order on (page_id, created_at DESC) benefits from a composite index. Without it, daily scans can be expensive.

models.Index(fields=["page", "-created_at"], name="page_created_at_idx")

This was flagged earlier; re-iterating here for this code path.

I can generate the migration for you.

320-353: Performance: add composite index to support windowed prune (IssueDescriptionVersion).

Add:

models.Index(fields=["issue", "-created_at"], name="issue_created_at_idx")

This aligns with the delete query pattern.

Happy to generate the migration.

🧹 Nitpick comments (5)

apps/api/plane/settings/mongo.py (1)
31-44: Optional: add a simple init lock to make the singleton thread-safe.

Concurrent new calls could race in threaded contexts. A lightweight module-level Lock prevents double init.

Example:
+import threading
@@
-    _instance: Optional["MongoConnection"] = None
+    _instance: Optional["MongoConnection"] = None
     _client: Optional[MongoClient] = None
     _db: Optional[Database] = None
+    _lock = threading.Lock()
@@
-        if cls._instance is None:
-            cls._instance = super(MongoConnection, cls).__new__(cls)
+        if cls._instance is None:
+            with cls._lock:
+                if cls._instance is None:
+                    cls._instance = super(MongoConnection, cls).__new__(cls)
+                    # init continues...
apps/api/plane/bgtasks/cleanup_task.py (4)
33-47: Tone down noisy "MongoDB not configured" logs and guard success message.

The task will log "MongoDB not configured" at INFO every run; this is noisy when Mongo is intentionally disabled. Prefer DEBUG.

After get_collection(...), log success only if a collection is actually returned.
-    if not MongoConnection.is_configured():
-        logger.info("MongoDB not configured")
+    if not MongoConnection.is_configured():
+        logger.debug("MongoDB not configured")
         return None
@@
-        logger.info(f"MongoDB collection '{collection_name}' connected successfully")
-        return mongo_collection
+        if mongo_collection is not None:
+            logger.info(f"MongoDB collection '{collection_name}' connected successfully")
+        return mongo_collection
82-142: Type hints for IDs should allow UUIDs.

Many Plane models use UUID primary keys. ids_to_delete: List[int] is misleading and could confuse future readers or static analysis.
-    ids_to_delete: List[int] = []
+    from uuid import UUID
+    ids_to_delete: List[UUID | int | str] = []
Also adjust the function signature of flush_to_mongo_and_delete accordingly if you want to be precise.

145-162: Potential PII in headers/body; confirm redaction policy before archiving.

APIActivityLog may include sensitive data in headers (e.g., Authorization), body, or response_body. Before inserting into Mongo, consider redacting/sealing sensitive fields, or dropping them entirely for long-term storage.

If desired, I can add a redact_api_log(record) step that strips tokens and known secrets.

185-205: Binary fields in Mongo: ensure compatibility and size limits.

description_binary (bytes) will be stored as BSON Binary. This is fine, but:

Large binaries can bloat the collection; consider dropping or compressing if archival is optional.

If consumers expect text-only payloads, consider omitting binary.

If needed, I can add a toggle to exclude binary fields from archival.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 5fd447c and 2fa6bb1.

📒 Files selected for processing (5)

apps/api/plane/bgtasks/cleanup_task.py (1 hunks)
apps/api/plane/settings/local.py (1 hunks)
apps/api/plane/settings/mongo.py (1 hunks)
apps/api/plane/settings/production.py (1 hunks)
apps/api/requirements/base.txt (1 hunks)

🧰 Additional context used

🧬 Code Graph Analysis (1)

apps/api/plane/bgtasks/cleanup_task.py (3)

apps/api/plane/db/models/issue.py (1)

IssueDescriptionVersion (795-837)

apps/api/plane/settings/mongo.py (3)

MongoConnection (18-118)

is_configured (111-118)

get_collection (88-108)

apps/api/plane/utils/exception_logger.py (1)

log_exception (9-20)

🪛 GitHub Actions: Build and lint API

apps/api/requirements/base.txt

[error] 8-8: Could not find a version that satisfies the requirement psycopg-binary==3.1.18 (available: 3.2.2, 3.2.3, 3.2.4, 3.2.5, 3.2.6, 3.2.7, 3.2.8, 3.2.9) during 'cd apps/api && pip install -r requirements.txt'.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: Analyze (python)
GitHub Check: Analyze (javascript)

🔇 Additional comments (2)

apps/api/plane/settings/production.py (1)

86-90: LGTM: Adds dedicated plane.mongo logger for production.

Consistent with local settings and the new Mongo integration; routing to console with INFO level is appropriate.

apps/api/plane/settings/local.py (1)

76-80: LGTM: Adds plane.mongo logger for local dev.

Matches production logger shape; good for observing Mongo lifecycle locally.

apps/api/plane/bgtasks/cleanup_task.py

apps/api/plane/settings/mongo.py

apps/api/requirements/base.txt

apps/api/plane/celery.py

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (2)

apps/api/plane/celery.py (2)

52-55: Critical: Update the DB PeriodicTask to point to the new API logs task path or the job will break under DatabaseScheduler

Because app.conf.beat_scheduler is set to django_celery_beat.schedulers.DatabaseScheduler (see Line 95), the DB’s PeriodicTask entry still referencing plane.bgtasks.api_logs_task.delete_api_logs will not be auto-updated. If the old module has been removed, the beat-run will fail with a NotRegistered/ImportError at runtime. Upsert/patch the PeriodicTask row to use plane.bgtasks.cleanup_task.delete_api_logs.

Run this to confirm there’s a migration/startup sync that updates the PeriodicTask for API logs to the new path:

#!/bin/bash
# Show that DatabaseScheduler is active
rg -n -C1 -P 'beat_scheduler\s*=\s*"django_celery_beat\.schedulers\.DatabaseScheduler"'

# Look for any code/migrations that create or update PeriodicTasks for delete_api_logs
rg -n -C3 -P '\bPeriodicTask\b.*(update_or_create|get_or_create|filter|create)\(' --type=py | rg -n -C2 -P 'delete_api_logs|cleanup_task\.delete_api_logs'

# Also check references by name
rg -n -C2 -P 'check-every-day-to-delete-api-logs' --type=py

Apply a data migration like this to safely update the task path (adjust dependency as needed):

from django.db import migrations

def update_api_logs_task_path(apps, schema_editor):
    PeriodicTask = apps.get_model("django_celery_beat", "PeriodicTask")
    # Update by name (preferred, stable)
    PeriodicTask.objects.filter(
        name="check-every-day-to-delete-api-logs"
    ).update(task="plane.bgtasks.cleanup_task.delete_api_logs")

    # Fallback: update any lingering old-path rows
    PeriodicTask.objects.filter(
        task="plane.bgtasks.api_logs_task.delete_api_logs"
    ).update(task="plane.bgtasks.cleanup_task.delete_api_logs")

class Migration(migrations.Migration):
    dependencies = [
        # add dependency on latest django_celery_beat migration and your app's latest migration
    ]
    operations = [migrations.RunPython(update_api_logs_task_path, migrations.RunPython.noop)]

56-59: Critical: With DatabaseScheduler enabled, these beat_schedule entries won’t run unless mirrored in django-celery-beat’s DB

Since Line 95 configures DatabaseScheduler, Celery Beat reads schedules from the DB, not from app.conf.beat_schedule. You need a data migration or startup sync to upsert CrontabSchedule and PeriodicTask rows for the three new tasks (and ensure the API logs task is updated to the new path).

Use this script to verify whether such a migration/upsert exists:

#!/bin/bash
# Verify the scheduler mode
rg -n -C1 -P 'beat_scheduler\s*=\s*"django_celery_beat\.schedulers\.DatabaseScheduler"'

# Search for code/migrations that upsert PeriodicTask/CrontabSchedule for the new tasks
rg -n -C3 -P '(CrontabSchedule|PeriodicTask).*delete_(email_notification_logs|page_versions|issue_description_versions)' --type=py

# Broad search for these task names/paths anywhere
rg -n -C2 -P 'delete_(email_notification_logs|page_versions|issue_description_versions)|cleanup_task\.(delete_email_notification_logs|delete_page_versions|delete_issue_description_versions)' --type=py

Example migration to upsert all four tasks:

from django.db import migrations

def upsert_cleanup_tasks(apps, schema_editor):
    CrontabSchedule = apps.get_model("django_celery_beat", "CrontabSchedule")
    PeriodicTask = apps.get_model("django_celery_beat", "PeriodicTask")

    def ensure_task(name, task, hour, minute):
        cs, _ = CrontabSchedule.objects.get_or_create(
            minute=str(minute),
            hour=str(hour),
            day_of_week="*",
            day_of_month="*",
            month_of_year="*",
        )
        PeriodicTask.objects.update_or_create(
            name=name,
            defaults={
                "task": task,
                "crontab": cs,
                "enabled": True,
                "one_off": False,
            },
        )

    ensure_task(
        "check-every-day-to-delete-api-logs",
        "plane.bgtasks.cleanup_task.delete_api_logs",
        2, 30,
    )
    ensure_task(
        "check-every-day-to-delete-email-notification-logs",
        "plane.bgtasks.cleanup_task.delete_email_notification_logs",
        3, 0,
    )
    ensure_task(
        "check-every-day-to-delete-page-versions",
        "plane.bgtasks.cleanup_task.delete_page_versions",
        3, 30,
    )
    ensure_task(
        "check-every-day-to-delete-issue-description-versions",
        "plane.bgtasks.cleanup_task.delete_issue_description_versions",
        4, 0,
    )

class Migration(migrations.Migration):
    dependencies = [
        # add dependency on the latest django_celery_beat migration and your app's latest migration
    ]
    operations = [migrations.RunPython(upsert_cleanup_tasks, migrations.RunPython.noop)]

Also applies to: 60-63, 64-67

🧹 Nitpick comments (1)

apps/api/plane/celery.py (1)
25-68: Add an inline note so future readers don’t assume this dict drives Beat in prod

Given DatabaseScheduler is enforced (Line 95), this beat_schedule dict is reference-only. A brief comment helps prevent drift and confusion.

Apply this diff to add a clarifying comment:
 app.conf.beat_schedule = {
-    # Intra day recurring jobs
+    # NOTE: Celery Beat uses django_celery_beat's DatabaseScheduler (see line 95).
+    # The entries below are documentation/reference only. Ensure matching PeriodicTask rows exist in the DB.
+    # Intra day recurring jobs

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 2fa6bb1 and a1d4adb.

📒 Files selected for processing (1)

apps/api/plane/celery.py (1 hunks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Analyze (javascript)

🔇 Additional comments (1)

apps/api/plane/celery.py (1)

56-59: Naming looks good (includes “notification” as requested)

The schedule key/task label is explicit and consistent with prior feedback. No changes needed.

- Added a parameter to check MongoDB availability in the flush_to_mongo_and_delete function. - Implemented error logging for MongoDB archival failures. - Updated calls to flush_to_mongo_and_delete to include the new parameter.

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (2)

apps/api/plane/bgtasks/cleanup_task.py (2)
306-337: Create composite DB indexes to support the windowed deletes.

These queries will scan heavily without (page_id, created_at DESC) and (issue_id, created_at DESC) indexes. This was previously flagged; still applies here.

I can open a migration to add:

PageVersion: Index(fields=["page", "-created_at"], name="page_created_at_idx")

IssueDescriptionVersion: Index(fields=["issue", "-created_at"], name="issue_created_at_idx")

Verification script to check if such indexes already exist:
#!/bin/bash
# Search model and migration files for the composite indexes
rg -nP 'class\s+PageVersion\b|class\s+IssueDescriptionVersion\b' -C3 apps
rg -nP 'models\.Index\(|index_together|indexes\s*=' -n apps
rg -nP "page_created_at_idx|issue_created_at_idx" -n apps
If you want, I’ll generate the migration file as well.

Also applies to: 342-373

14-16: Prevent archival data loss; harden Mongo writes and gate deletion.

Use unordered bulk writes for better throughput and resilience.

Catch broader Mongo exceptions to avoid accidental deletion after a failed archival attempt.

Gate the “skip deletion on archival failure” behavior behind MONGO_ARCHIVE_REQUIRED to make it configurable (defaults to proceed with deletion if archival fails). This aligns with prior feedback.

Apply these diffs:
-from pymongo.errors import BulkWriteError
+from pymongo.errors import BulkWriteError, PyMongoError
-    mongo_archival_failed = False
+    mongo_archival_failed = False

     # Try to insert into MongoDB if available
     if mongo_collection and mongo_available:
         try:
-            mongo_collection.bulk_write([InsertOne(doc) for doc in buffer])
-        except BulkWriteError as bwe:
+            mongo_collection.bulk_write(
+                [InsertOne(doc) for doc in buffer],
+                ordered=False,
+            )
+        except (BulkWriteError, PyMongoError) as bwe:
             logger.error(f"MongoDB bulk write error: {str(bwe)}")
             log_exception(bwe)
             mongo_archival_failed = True
+        except Exception as e:
+            logger.error(f"Unexpected MongoDB error: {str(e)}")
+            log_exception(e)
+            mongo_archival_failed = True

-    # If MongoDB is available and archival failed, log the error and return
-    if mongo_available and mongo_archival_failed:
-        logger.error(f"MongoDB archival failed for {len(buffer)} records")
-        return
+    # If archival is required and failed, skip deletion for this batch
+    archive_required = os.environ.get("MONGO_ARCHIVE_REQUIRED", "false").lower() in ("1", "true", "yes")
+    if mongo_available and mongo_archival_failed and archive_required:
+        logger.warning(
+            f"Skipping deletion of {len(ids_to_delete)} records because Mongo archival failed and MONGO_ARCHIVE_REQUIRED=true"
+        )
+        return
Also applies to: 67-82

🧹 Nitpick comments (6)

apps/api/plane/bgtasks/cleanup_task.py (6)

30-30: Make batch size configurable via env.

Allow tuning without code changes; large documents (e.g., HTML/JSON) may benefit from smaller batches.

-BATCH_SIZE = 1000
+BATCH_SIZE = int(os.environ.get("CLEANUP_BATCH_SIZE", "1000"))

309-314: Add a deterministic tie-breaker to window ordering.

If multiple rows share the same created_at, deletions can be non-deterministic across runs. Add id DESC as a secondary order key.

-                order_by=F("created_at").desc(),
+                order_by=[F("created_at").desc(), F("id").desc()],

-                order_by=F("created_at").desc(),
+                order_by=[F("created_at").desc(), F("id").desc()],

Also applies to: 345-350

166-183: Redact secrets and PII in logs before archiving.

headers, query_params, and body can contain tokens, cookies, emails, etc. Add sanitization helpers and invoke them in transforms to minimize exposure.

Apply this diff to the API log transform:

-        "query_params": record.get("query_params"),
-        "headers": record.get("headers"),
-        "body": record.get("body"),
+        "query_params": sanitize_query_params(record.get("query_params")),
+        "headers": sanitize_headers(record.get("headers")),
+        "body": sanitize_body(record.get("body")),

Optionally also sanitize email log payload:

-        "data": record["data"],
+        "data": sanitize_email_data(record["data"]),

Add these helpers (place anywhere above the transform functions):

REDACT_KEYS = {
    "authorization", "cookie", "set-cookie", "x-api-key", "x-api-token",
    "x-auth-token", "x-authorization", "proxy-authorization", "api_key",
    "access_token", "refresh_token", "jwt", "bearer"
}

def _redact_value(v: Any) -> Any:
    if v is None:
        return None
    s = str(v)
    if len(s) <= 8:
        return "****"
    return s[:4] + "****" + s[-4:]

def sanitize_headers(headers: Any) -> Any:
    if not isinstance(headers, dict):
        return headers
    return {
        k: ("[REDACTED]" if k.lower() in REDACT_KEYS else v)
        for k, v in headers.items()
    }

def sanitize_query_params(qp: Any) -> Any:
    if isinstance(qp, dict):
        return {
            k: ("[REDACTED]" if k.lower() in REDACT_KEYS else v)
            for k, v in qp.items()
        }
    return qp

def sanitize_body(body: Any) -> Any:
    # Best-effort: redact obvious tokens in flat JSON dicts; leave others as-is
    if isinstance(body, dict):
        return {
            k: ("[REDACTED]" if k.lower() in REDACT_KEYS else v)
            for k, v in body.items()
        }
    return body

def sanitize_email_data(data: Any) -> Any:
    # Optionally trim oversized payloads or remove sensitive fields
    return data

Optionally, add env flags to drop large bodies entirely (e.g., ARCHIVE_API_BODIES=false).

Also applies to: 185-204

169-171: Prefer native datetimes over strings for Mongo.

Avoid str() around datetimes; PyMongo will store timezone-aware datetime as BSON Date, enabling TTL/indexed queries later.

Example for one field (apply across all transforms):

-        "created_at": str(record["created_at"]) if record.get("created_at") else None,
+        "created_at": record.get("created_at"),

Also applies to: 189-191, 210-211, 232-233

9-9: Include unsent email logs in cutoff via fallback to created_at.

Using only sent_at excludes failed/unsent logs. Consider deleting when either sent_at or created_at is older than cutoff.

Apply:

-from django.db.models import F, Window, Subquery
+from django.db.models import F, Window, Subquery, Q

-        EmailNotificationLog.all_objects.filter(sent_at__lte=cutoff_time)
+        EmailNotificationLog.all_objects.filter(
+            Q(sent_at__lte=cutoff_time) | Q(created_at__lte=cutoff_time)
+        )

If current behavior is intentional (only sent emails), ignore this.

Also applies to: 285-286

33-46: Reduce log verbosity for Mongo collection “connected successfully”.

At INFO level this fires on each task run; consider DEBUG to keep worker logs cleaner.

-        logger.info(f"MongoDB collection '{collection_name}' connected successfully")
+        logger.debug(f"MongoDB collection '{collection_name}' connected successfully")

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between a1d4adb and a861f36.

📒 Files selected for processing (1)

apps/api/plane/bgtasks/cleanup_task.py (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

apps/api/plane/bgtasks/cleanup_task.py (3)

apps/api/plane/db/models/issue.py (1)

IssueDescriptionVersion (795-837)

apps/api/plane/settings/mongo.py (3)

MongoConnection (18-118)

is_configured (111-118)

get_collection (88-108)

apps/api/plane/utils/exception_logger.py (1)

log_exception (9-20)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Analyze (javascript)

apps/api/plane/bgtasks/cleanup_task.py

- Updated the parameter name from 'mode' to 'model' in the process_cleanup_task function to ensure consistency and clarity in the code.

… class - Replaced direct access to settings with getattr for MONGO_DB_URL and MONGO_DB_DATABASE to enhance robustness. - Added warning logging for missing MongoDB connection parameters.

Refactor and extend cleanup tasks for logs and versions

5fd447c

- Consolidate API log deletion into cleanup_task.py - Add tasks to delete old email logs, page versions, and issue description versions - Update Celery schedule and imports for new tasks

coderabbitai bot reviewed Aug 20, 2025

View reviewed changes

apps/api/plane/bgtasks/cleanup_task.py Outdated Show resolved Hide resolved

apps/api/plane/celery.py Show resolved Hide resolved

chore: update cleanup task with mongo changes

2fa6bb1

coderabbitai bot reviewed Aug 20, 2025

View reviewed changes

apps/api/plane/bgtasks/cleanup_task.py Show resolved Hide resolved

apps/api/plane/settings/mongo.py Outdated Show resolved Hide resolved

apps/api/requirements/base.txt Show resolved Hide resolved

dheeru0198 reviewed Aug 20, 2025

View reviewed changes

apps/api/plane/celery.py Outdated Show resolved Hide resolved

fix: update log deletion task name for clarity

a1d4adb

coderabbitai bot reviewed Aug 20, 2025

View reviewed changes

coderabbitai bot reviewed Aug 22, 2025

View reviewed changes

apps/api/plane/bgtasks/cleanup_task.py Show resolved Hide resolved

dheeru0198 reviewed Aug 22, 2025

View reviewed changes

apps/api/plane/bgtasks/cleanup_task.py Outdated Show resolved Hide resolved

fix: correct parameter name in cleanup task function call

360122c

- Updated the parameter name from 'mode' to 'model' in the process_cleanup_task function to ensure consistency and clarity in the code.

dheeru0198 approved these changes Aug 22, 2025

View reviewed changes

fix: improve MongoDB connection parameter handling in MongoConnection…

6d7f358

… class - Replaced direct access to settings with getattr for MONGO_DB_URL and MONGO_DB_DATABASE to enhance robustness. - Added warning logging for missing MongoDB connection parameters.

pushya22 added the ready to merge label Aug 22, 2025

sriramveeraghanta approved these changes Aug 22, 2025

View reviewed changes

sriramveeraghanta merged commit 935e4b5 into preview Aug 24, 2025
5 of 7 checks passed

sriramveeraghanta deleted the cleanup-tasks branch August 24, 2025 09:43

coderabbitai bot mentioned this pull request Sep 11, 2025

[WEB-4873]: Add webhook log cleanup task and update Celery schedule #7772

Merged

2 tasks

coderabbitai bot mentioned this pull request Oct 3, 2025

[WEB-5071] chore: implement webhook logging to MongoDB and fallback to database #7896

Merged

1 task

[WEB-4720] chore: refactor and extend cleanup tasks for logs and versions #7604

[WEB-4720] chore: refactor and extend cleanup tasks for logs and versions #7604

Uh oh!

Conversation

pablohashescobar commented Aug 20, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of Change

Test Scenarios

References

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Aug 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Status, Documentation and Community

Uh oh!

makeplane bot commented Aug 20, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

pablohashescobar commented Aug 20, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Aug 20, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)