Skip to content

feat: add script to delete data older than 90 days#2693

Open
wilsonrivera wants to merge 6 commits intomainfrom
wilson/eng-7755-delete-old-data-from-the-database
Open

feat: add script to delete data older than 90 days#2693
wilsonrivera wants to merge 6 commits intomainfrom
wilson/eng-7755-delete-old-data-from-the-database

Conversation

@wilsonrivera
Copy link
Copy Markdown
Contributor

@wilsonrivera wilsonrivera commented Mar 25, 2026

Summary by CodeRabbit

  • Chores
    • Added database indexes to schema-check related tables to improve query performance; migration entry recorded.
    • Introduced a new automated data-retention cleanup utility that removes records older than 90 days (schema checks, schema versions, audit logs, webhook deliveries) with batched, logged deletions.

Checklist

  • I have discussed my proposed changes in an issue and have received approval to proceed.
  • I have followed the coding standards of the project.
  • Tests or benchmarks have been added or updated.
  • Documentation has been updated on https://github.com/wundergraph/docs-website.
  • I have read the Contributors Guide.

Open Source AI Manifesto

This project follows the principles of the Open Source AI Manifesto. Please ensure your contribution aligns with its principles.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 25, 2026

Walkthrough

Adds five PostgreSQL indexes via a migration and journal entry, updates Drizzle schema declarations to declare those indexes, and introduces a new TypeScript cleanup script that deletes old rows from several tables in batched transactions.

Changes

Cohort / File(s) Summary
Migration SQL & Journal
controlplane/migrations/0137_certain_monster_badoon.sql, controlplane/migrations/meta/_journal.json
Adds five CREATE INDEX IF NOT EXISTS statements for: schema_check_change_action(schema_check_subgraph_id), schema_check_graph_pruning_action(schema_check_subgraph_id), schema_check_lint_action(schema_check_subgraph_id), schema_check_subgraphs(namespace_id), and schema_checks(check_extension_delivery_id). Appends migration entry to journal.
Schema declarations
controlplane/src/db/schema.ts
Declares matching Drizzle ORM indexes: sceWebhookDeliveryIdIndex on schemaChecks.checkExtensionDeliveryId, namespaceIdIndex on schemaCheckSubgraphs.namespaceId, and schemaCheckSubgraphIdIndex on schemaCheckChangeAction, schemaCheckLintAction, and schemaCheckGraphPruningAction.
Data cleanup script
controlplane/src/bin/cleanup-old-data.ts
Adds a new TypeScript script that computes a 90-day cutoff and performs batched, chunked, transactional deletes (with parallel groups) for schemaChecks, schemaVersion, auditLogs, and webhookDeliveries, preserving referenced schema versions via a keep-set.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title accurately summarizes the main change: adding a cleanup script to delete data older than 90 days, which is the primary purpose evidenced by the new cleanup-old-data.ts script and supporting database migrations.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (5)
controlplane/src/bin/cleanup-old-data.ts (5)

105-145: Add explicit return type annotation.

Per TypeScript coding guidelines, functions should have explicit return type annotations.

Proposed fix
 async function deleteOldDataInChunks(
   db: PostgresJsDatabase<typeof schema>,
   table:
     | typeof schema.schemaChecks
     | typeof schema.graphCompositions
     | typeof schema.auditLogs
     | typeof schema.webhookDeliveries,
-) {
+): Promise<number> {

As per coding guidelines: "Use explicit type annotations for function parameters and return types in TypeScript".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@controlplane/src/bin/cleanup-old-data.ts` around lines 105 - 145, The
function deleteOldDataInChunks is missing an explicit return type; add a
TypeScript return annotation so the signature declares it returns
Promise<number> (keep the existing parameters like db: PostgresJsDatabase<typeof
schema> and the union table type unchanged) to satisfy the project's explicit
type annotation guideline.

178-234: Add explicit return type annotation.

Proposed fix
-async function deleteOldSchemaVersions(db: PostgresJsDatabase<typeof schema>) {
+async function deleteOldSchemaVersions(db: PostgresJsDatabase<typeof schema>): Promise<number> {

As per coding guidelines: "Use explicit type annotations for function parameters and return types in TypeScript".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@controlplane/src/bin/cleanup-old-data.ts` around lines 178 - 234, The
function deleteOldSchemaVersions is missing an explicit return type; add an
explicit Promise<number> return annotation to its signature (the db parameter
can remain typed as PostgresJsDatabase<typeof schema>) so the declaration
becomes deleteOldSchemaVersions(db: PostgresJsDatabase<typeof schema>):
Promise<number>, ensuring the async function's return type is explicit and
matches the returned deleteCount number.

12-14: Consider making the retention period configurable.

The 90-day cutoff is hardcoded. Consider making this configurable via environment variable or command-line argument for operational flexibility.

Proposed approach
+const DEFAULT_RETENTION_DAYS = 90;
+const RETENTION_DAYS = Number(process.env.CLEANUP_RETENTION_DAYS) || DEFAULT_RETENTION_DAYS;
+
-const CUTOFF_DATE = startOfDay(subDays(new Date(), 90));
+const CUTOFF_DATE = startOfDay(subDays(new Date(), RETENTION_DAYS));
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@controlplane/src/bin/cleanup-old-data.ts` around lines 12 - 14, CUTOFF_DATE
is hardcoded to 90 days; make the retention period configurable by reading a
RETENTION_DAYS environment variable or a command-line flag and fall back to 90
if not provided/invalid. Update the code that defines CUTOFF_DATE to parse and
validate the variable (e.g., Number(process.env.RETENTION_DAYS) or a parsed CLI
arg) and then compute startOfDay(subDays(new Date(), retentionDays)); leave
ITEMS_PER_CHUNK and NUMBER_OF_PARALLEL_TRANSACTIONS as-is but consider adding
similar env/CLI overrides if desired.

32-94: Consider adding a dry-run mode for safety.

For a destructive operation like this, a --dry-run flag that logs what would be deleted without actually deleting would help operators validate the cleanup scope before execution, especially in production environments.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@controlplane/src/bin/cleanup-old-data.ts` around lines 32 - 94, Add a
--dry-run option to the cleanup script so the process logs what would be removed
without performing deletes: parse process.argv for a dryRun boolean at script
startup and thread it into the deletion calls used in this file (pass to
deleteOldDataInChunks and deleteOldSchemaVersions), then modify those functions
to accept a dryRun flag and short-circuit actual DELETE operations — instead run
the same SELECT/count logic and log the IDs/counts and messages using
CUTOFF_DATE and the existing console messages; keep the same timing and final
summary but ensure no DB mutations happen when dryRun is true and still close
queryConnection in finally.

161-176: Add explicit return type annotation.

Proposed fix
-async function getFederatedGraphsFlags(db: PostgresJsDatabase<typeof schema>) {
+async function getFederatedGraphsFlags(db: PostgresJsDatabase<typeof schema>): Promise<string[]> {

As per coding guidelines: "Use explicit type annotations for function parameters and return types in TypeScript".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@controlplane/src/bin/cleanup-old-data.ts` around lines 161 - 176, Add an
explicit return type to the getFederatedGraphsFlags function signature (e.g.
Promise<Array<number | string>> or the precise id type used by your schema) so
the function signature declares its return type instead of relying on inference;
update the signature for getFederatedGraphsFlags(db: PostgresJsDatabase<typeof
schema>): Promise<Array<number | string>> (or replace number | string with the
exact type of baseCompositionSchemaVersionId/composedSchemaVersionId from
schema) to satisfy the coding guideline.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@controlplane/src/bin/cleanup-old-data.ts`:
- Around line 105-145: The function deleteOldDataInChunks is missing an explicit
return type; add a TypeScript return annotation so the signature declares it
returns Promise<number> (keep the existing parameters like db:
PostgresJsDatabase<typeof schema> and the union table type unchanged) to satisfy
the project's explicit type annotation guideline.
- Around line 178-234: The function deleteOldSchemaVersions is missing an
explicit return type; add an explicit Promise<number> return annotation to its
signature (the db parameter can remain typed as PostgresJsDatabase<typeof
schema>) so the declaration becomes deleteOldSchemaVersions(db:
PostgresJsDatabase<typeof schema>): Promise<number>, ensuring the async
function's return type is explicit and matches the returned deleteCount number.
- Around line 12-14: CUTOFF_DATE is hardcoded to 90 days; make the retention
period configurable by reading a RETENTION_DAYS environment variable or a
command-line flag and fall back to 90 if not provided/invalid. Update the code
that defines CUTOFF_DATE to parse and validate the variable (e.g.,
Number(process.env.RETENTION_DAYS) or a parsed CLI arg) and then compute
startOfDay(subDays(new Date(), retentionDays)); leave ITEMS_PER_CHUNK and
NUMBER_OF_PARALLEL_TRANSACTIONS as-is but consider adding similar env/CLI
overrides if desired.
- Around line 32-94: Add a --dry-run option to the cleanup script so the process
logs what would be removed without performing deletes: parse process.argv for a
dryRun boolean at script startup and thread it into the deletion calls used in
this file (pass to deleteOldDataInChunks and deleteOldSchemaVersions), then
modify those functions to accept a dryRun flag and short-circuit actual DELETE
operations — instead run the same SELECT/count logic and log the IDs/counts and
messages using CUTOFF_DATE and the existing console messages; keep the same
timing and final summary but ensure no DB mutations happen when dryRun is true
and still close queryConnection in finally.
- Around line 161-176: Add an explicit return type to the
getFederatedGraphsFlags function signature (e.g. Promise<Array<number | string>>
or the precise id type used by your schema) so the function signature declares
its return type instead of relying on inference; update the signature for
getFederatedGraphsFlags(db: PostgresJsDatabase<typeof schema>):
Promise<Array<number | string>> (or replace number | string with the exact type
of baseCompositionSchemaVersionId/composedSchemaVersionId from schema) to
satisfy the coding guideline.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: a25c1271-e9ba-410c-8f4e-5e72d87b484a

📥 Commits

Reviewing files that changed from the base of the PR and between f9a7513 and 57f4c87.

📒 Files selected for processing (6)
  • controlplane/migrations/0137_certain_monster_badoon.sql
  • controlplane/migrations/meta/0137_snapshot.json
  • controlplane/migrations/meta/_journal.json
  • controlplane/package.json
  • controlplane/src/bin/cleanup-old-data.ts
  • controlplane/src/db/schema.ts

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 25, 2026

Codecov Report

❌ Patch coverage is 2.27273% with 215 lines in your changes missing coverage. Please review.
✅ Project coverage is 64.13%. Comparing base (9a303d1) to head (02a6c63).
⚠️ Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
controlplane/src/bin/cleanup-old-data.ts 0.00% 214 Missing and 1 partial ⚠️

❌ Your patch check has failed because the patch coverage (2.27%) is below the target coverage (90.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2693      +/-   ##
==========================================
+ Coverage   63.10%   64.13%   +1.02%     
==========================================
  Files         245      306      +61     
  Lines       26273    43633   +17360     
  Branches        0     4666    +4666     
==========================================
+ Hits        16580    27983   +11403     
- Misses       8352    15627    +7275     
+ Partials     1341       23    -1318     
Files with missing lines Coverage Δ
controlplane/src/db/schema.ts 100.00% <100.00%> (ø)
controlplane/src/bin/cleanup-old-data.ts 0.00% <0.00%> (ø)

... and 549 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

// Delete compositions
deleteStart = performance.now();
console.log(`Deleting compositions created before ${CUTOFF_DATE.toDateString()}...`);
const compositionsDeleted = await deleteOldDataInChunks(db, schema.graphCompositions);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why delete graph compositions? on deletion of schema versions the graph compositions get deleted right?

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
controlplane/src/bin/cleanup-old-data.ts (1)

107-114: Add explicit return types to the remaining helpers.

deleteOldDataInChunks, getFederatedGraphsFlags, and deleteOldCompositions still rely on inference. Please declare their Promise<number> / Promise<string[]> return types so this file matches the repo TypeScript rule.

As per coding guidelines, "Use explicit type annotations for function parameters and return types in TypeScript".

Also applies to: 163-163, 180-180

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@controlplane/src/bin/cleanup-old-data.ts` around lines 107 - 114, Add
explicit return type annotations to the helper functions to satisfy the repo
TypeScript rule: declare deleteOldDataInChunks as returning Promise<number>,
getFederatedGraphsFlags as returning Promise<string[]>, and
deleteOldCompositions as returning Promise<number>. Update the function
signatures (for the symbols deleteOldDataInChunks, getFederatedGraphsFlags,
deleteOldCompositions) to include these return types while leaving the internal
logic unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@controlplane/src/bin/cleanup-old-data.ts`:
- Around line 247-248: The current logic pushes schemaVersionIds from
graphCompositions and graphCompositionSubgraphs into the global
idsOfSchemaVersionsToKeep via getSchemaVersionIds (the two all.push lines),
which incorrectly treats every old composition as a version to retain and
prevents deleteOldSchemaVersions()/deleteOldCompositions() from cleaning up;
change the logic so the keep-set is built only from rows you actually intend to
retain (e.g., only compositions marked active/retained) or maintain separate
keep-sets (e.g., compositionSchemaVersionIds vs retainedSchemaVersionIds) and
use the appropriate set when calling deleteOldSchemaVersions and
deleteOldCompositions; locate and update the code paths around
getSchemaVersionIds, idsOfSchemaVersionsToKeep, deleteOldSchemaVersions, and
deleteOldCompositions to either filter
graphCompositions/graphCompositionSubgraphs rows by intended-retention criteria
or split them into distinct keep sets.
- Around line 183-190: The code currently selects all expired graphCompositions
into memory (variable graphCompositions) — change this to page through results
using a loop that re-queries with a limit and an orderBy (e.g., order by
createdAt and use a BATCH_SIZE) and delete each page before fetching the next,
following the pattern in deleteOldDataInChunks(); apply the same
pagination/fetch-by-page fix for the schemaVersion selection block referenced in
the comment (the block at ~228-232) so no full retention set is materialized in
memory and each iteration re-queries with lte(schemaX.createdAt, CUTOFF_DATE),
orderBy createdAt and limit BATCH_SIZE until no rows remain.

---

Nitpick comments:
In `@controlplane/src/bin/cleanup-old-data.ts`:
- Around line 107-114: Add explicit return type annotations to the helper
functions to satisfy the repo TypeScript rule: declare deleteOldDataInChunks as
returning Promise<number>, getFederatedGraphsFlags as returning
Promise<string[]>, and deleteOldCompositions as returning Promise<number>.
Update the function signatures (for the symbols deleteOldDataInChunks,
getFederatedGraphsFlags, deleteOldCompositions) to include these return types
while leaving the internal logic unchanged.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 9d402453-a2ee-46bd-a570-4a87854eeb1d

📥 Commits

Reviewing files that changed from the base of the PR and between 57f4c87 and 02a6c63.

📒 Files selected for processing (1)
  • controlplane/src/bin/cleanup-old-data.ts

Comment on lines +183 to +190
const graphCompositions = await db
.select({
id: schema.graphCompositions.id,
schemaVersionId: schema.graphCompositions.schemaVersionId,
})
.from(schema.graphCompositions)
.where(lte(schema.graphCompositions.createdAt, CUTOFF_DATE))
.execute();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Page these candidates instead of loading the full retention set into memory.

Both helpers materialize every expired graphCompositions / schemaVersion row before batching deletes. On a retention run, that can create a large memory spike and a long initial scan. Re-query each batch with limit/orderBy, like deleteOldDataInChunks() does.

Also applies to: 228-232

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@controlplane/src/bin/cleanup-old-data.ts` around lines 183 - 190, The code
currently selects all expired graphCompositions into memory (variable
graphCompositions) — change this to page through results using a loop that
re-queries with a limit and an orderBy (e.g., order by createdAt and use a
BATCH_SIZE) and delete each page before fetching the next, following the pattern
in deleteOldDataInChunks(); apply the same pagination/fetch-by-page fix for the
schemaVersion selection block referenced in the comment (the block at ~228-232)
so no full retention set is materialized in memory and each iteration re-queries
with lte(schemaX.createdAt, CUTOFF_DATE), orderBy createdAt and limit BATCH_SIZE
until no rows remain.

@wilsonrivera wilsonrivera requested a review from JivusAyrus March 30, 2026 14:48
console.log(` ${schemaVersionsDeleted} schema versions deleted in ${deleteDurationInSeconds.toFixed(3)} seconds`);

// Delete compositions
if (schemaVersionsDeleted > 0) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the graph compositions are getting deleted after the schema versions, then we need not do them at all right? as they get cascaded on deleting schema versions

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are correct, I was overcomplicating it

@wilsonrivera wilsonrivera requested a review from JivusAyrus April 9, 2026 21:54
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
controlplane/src/bin/cleanup-old-data.ts (2)

192-193: ⚠️ Potential issue | 🟠 Major

Composition-derived IDs in the keep-set block old schema-version cleanup.

Line 192-193 adds graphCompositions / graphCompositionSubgraphs schemaVersion IDs to the global keep-set, which can keep old schema versions alive solely because old compositions still reference them. That defeats the intended cascade-based cleanup path for old compositions.

💡 Suggested fix
-    all.push(...(await getSchemaVersionIds(db, schema.graphCompositions, 'schemaVersionId')));
-    all.push(...(await getSchemaVersionIds(db, schema.graphCompositionSubgraphs, 'schemaVersionId')));
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@controlplane/src/bin/cleanup-old-data.ts` around lines 192 - 193, The
keep-set population is incorrectly adding schemaVersion IDs derived from
compositions which prevents cascade cleanup; remove the two pushes that call
getSchemaVersionIds for schema.graphCompositions and
schema.graphCompositionSubgraphs (the lines referencing schema.graphCompositions
and schema.graphCompositionSubgraphs and getSchemaVersionIds) so
composition-derived schemaVersionIds are not added to the global keep-set and
allow compositions to be cleaned up via the intended cascade path.

173-177: ⚠️ Potential issue | 🟠 Major

Avoid materializing all expired schema versions in memory.

Line 173-177 loads the entire expired schemaVersion set before chunking with splice (Line 201-205). On large retention runs, this can create a big memory spike and long initial scan; page candidates in a loop (limit/order/cursor) instead.

Also applies to: 201-205

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@controlplane/src/bin/cleanup-old-data.ts` around lines 173 - 177, The code
currently materializes all expired schema versions into schemaVersions via
db.select(...).from(schema.schemaVersion).where(lte(schema.schemaVersion.createdAt,
CUTOFF_DATE)).execute(), then chunks with splice which causes large memory
spikes; change this to a paginated query loop that repeatedly fetches a bounded
page (e.g., limit N) ordered by schema.schemaVersion.createdAt (and tie-breaker
id), process each page, and advance via a cursor/last-seen key until no more
rows, and remove the splice-based chunking logic (references: schemaVersions,
schema.schemaVersion, CUTOFF_DATE). Ensure each page is processed and deleted
before fetching the next to keep memory constant.
🧹 Nitpick comments (1)
controlplane/src/bin/cleanup-old-data.ts (1)

97-104: Add explicit return types to async helper functions.

deleteOldDataInChunks and getFederatedGraphsFlags currently rely on inferred return types. Please annotate them explicitly for consistency and safer API contracts.

💡 Suggested fix
 async function deleteOldDataInChunks(
   db: PostgresJsDatabase<typeof schema>,
   table:
     | typeof schema.schemaChecks
     | typeof schema.graphCompositions
     | typeof schema.auditLogs
     | typeof schema.webhookDeliveries,
-) {
+): Promise<number> {
@@
 async function getFederatedGraphsFlags(db: PostgresJsDatabase<typeof schema>) {
+async function getFederatedGraphsFlags(db: PostgresJsDatabase<typeof schema>): Promise<string[]> {

As per coding guidelines, "**/*.{ts,tsx}: Use explicit type annotations for function parameters and return types in TypeScript."

Also applies to: 153-153

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@controlplane/src/bin/cleanup-old-data.ts` around lines 97 - 104, Annotate the
async helper functions with explicit return types: add ": Promise<void>" to the
deleteOldDataInChunks function declaration so its contract is explicit, and add
an explicit Promise return type to getFederatedGraphsFlags (e.g. ":
Promise<Record<string, boolean>>" or the appropriate typed interface if it
returns a more specific shape) to replace inferred types; update the function
signatures for deleteOldDataInChunks and getFederatedGraphsFlags accordingly so
TypeScript enforces the return contract.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@controlplane/src/bin/cleanup-old-data.ts`:
- Around line 49-60: The cleanup deletes schema versions via
deleteOldSchemaVersions (returning idsOfSchemaVersionsToKeep) before removing
audit logs, which can violate the FK on audit_logs.schemaVersionId and abort;
reorder or adjust logic so audit logs are deleted before schema versions (call
deleteOldDataInChunks for schema.auditLogs prior to invoking
deleteOldSchemaVersions), or alternatively ensure deleteOldSchemaVersions uses
audit log data to populate idsOfSchemaVersionsToKeep (include schemaVersionIds
referenced by old audit logs) so referenced schema versions are retained; update
the sequence around deleteOldSchemaVersions, deleteOldDataInChunks, and the
idsOfSchemaVersionsToKeep handling to enforce FK-safe deletion.

---

Duplicate comments:
In `@controlplane/src/bin/cleanup-old-data.ts`:
- Around line 192-193: The keep-set population is incorrectly adding
schemaVersion IDs derived from compositions which prevents cascade cleanup;
remove the two pushes that call getSchemaVersionIds for schema.graphCompositions
and schema.graphCompositionSubgraphs (the lines referencing
schema.graphCompositions and schema.graphCompositionSubgraphs and
getSchemaVersionIds) so composition-derived schemaVersionIds are not added to
the global keep-set and allow compositions to be cleaned up via the intended
cascade path.
- Around line 173-177: The code currently materializes all expired schema
versions into schemaVersions via
db.select(...).from(schema.schemaVersion).where(lte(schema.schemaVersion.createdAt,
CUTOFF_DATE)).execute(), then chunks with splice which causes large memory
spikes; change this to a paginated query loop that repeatedly fetches a bounded
page (e.g., limit N) ordered by schema.schemaVersion.createdAt (and tie-breaker
id), process each page, and advance via a cursor/last-seen key until no more
rows, and remove the splice-based chunking logic (references: schemaVersions,
schema.schemaVersion, CUTOFF_DATE). Ensure each page is processed and deleted
before fetching the next to keep memory constant.

---

Nitpick comments:
In `@controlplane/src/bin/cleanup-old-data.ts`:
- Around line 97-104: Annotate the async helper functions with explicit return
types: add ": Promise<void>" to the deleteOldDataInChunks function declaration
so its contract is explicit, and add an explicit Promise return type to
getFederatedGraphsFlags (e.g. ": Promise<Record<string, boolean>>" or the
appropriate typed interface if it returns a more specific shape) to replace
inferred types; update the function signatures for deleteOldDataInChunks and
getFederatedGraphsFlags accordingly so TypeScript enforces the return contract.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: a9ed15ce-c8fc-40a1-885e-1c7f2d0ba137

📥 Commits

Reviewing files that changed from the base of the PR and between 02a6c63 and 7e2484d.

📒 Files selected for processing (1)
  • controlplane/src/bin/cleanup-old-data.ts

Comment on lines +49 to +60
// Delete schema versions
deleteStart = performance.now();
console.log(`Deleting schema versions created before ${CUTOFF_DATE.toDateString()}...`);
const [schemaVersionsDeleted, idsOfSchemaVersionsToKeep] = await deleteOldSchemaVersions(db);

deleteDurationInSeconds = (performance.now() - deleteStart) / 1000;
console.log(` ${schemaVersionsDeleted} schema versions deleted in ${deleteDurationInSeconds.toFixed(3)} seconds`);

// Delete audit logs
deleteStart = performance.now();
console.log(`Deleting audit logs created before ${CUTOFF_DATE.toDateString()}...`);
const auditLogsDeleted = await deleteOldDataInChunks(db, schema.auditLogs);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Delete order can violate FK constraints and stop the cleanup job.

At Line 52, schema versions are deleted before audit logs (Line 60), but audit_logs.schemaVersionId has onDelete: 'no action' (controlplane/src/db/schema.ts:2030-2040). Since audit-log schemaVersion IDs are not included in the keep-set (Line 187-194), a schema version referenced only by old audit logs can fail deletion and abort the script.

💡 Suggested fix
-  // Delete schema versions
-  deleteStart = performance.now();
-  console.log(`Deleting schema versions created before ${CUTOFF_DATE.toDateString()}...`);
-  const [schemaVersionsDeleted, idsOfSchemaVersionsToKeep] = await deleteOldSchemaVersions(db);
-
-  deleteDurationInSeconds = (performance.now() - deleteStart) / 1000;
-  console.log(`  ${schemaVersionsDeleted} schema versions deleted in ${deleteDurationInSeconds.toFixed(3)} seconds`);
-
   // Delete audit logs
   deleteStart = performance.now();
   console.log(`Deleting audit logs created before ${CUTOFF_DATE.toDateString()}...`);
   const auditLogsDeleted = await deleteOldDataInChunks(db, schema.auditLogs);

   deleteDurationInSeconds = (performance.now() - deleteStart) / 1000;
   console.log(`  ${auditLogsDeleted} audit logs deleted in ${deleteDurationInSeconds.toFixed(3)} seconds`);
+
+  // Delete schema versions
+  deleteStart = performance.now();
+  console.log(`Deleting schema versions created before ${CUTOFF_DATE.toDateString()}...`);
+  const [schemaVersionsDeleted, idsOfSchemaVersionsToKeep] = await deleteOldSchemaVersions(db);
+
+  deleteDurationInSeconds = (performance.now() - deleteStart) / 1000;
+  console.log(`  ${schemaVersionsDeleted} schema versions deleted in ${deleteDurationInSeconds.toFixed(3)} seconds`);
     all.push(...(await getFederatedGraphsFlags(db)));
+    all.push(...(await getSchemaVersionIds(db, schema.auditLogs, 'schemaVersionId')));
     all.push(...(await getSchemaVersionIds(db, schema.pluginImageVersions, 'schemaVersionId')));

Also applies to: 183-194

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@controlplane/src/bin/cleanup-old-data.ts` around lines 49 - 60, The cleanup
deletes schema versions via deleteOldSchemaVersions (returning
idsOfSchemaVersionsToKeep) before removing audit logs, which can violate the FK
on audit_logs.schemaVersionId and abort; reorder or adjust logic so audit logs
are deleted before schema versions (call deleteOldDataInChunks for
schema.auditLogs prior to invoking deleteOldSchemaVersions), or alternatively
ensure deleteOldSchemaVersions uses audit log data to populate
idsOfSchemaVersionsToKeep (include schemaVersionIds referenced by old audit
logs) so referenced schema versions are retained; update the sequence around
deleteOldSchemaVersions, deleteOldDataInChunks, and the
idsOfSchemaVersionsToKeep handling to enforce FK-safe deletion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants