Database Performance Optimizations #4

akshayutture-augment · 2025-11-14T23:42:06Z

Test 7

Summary by CodeRabbit

Bug Fixes
- Improved annotation cleanup reliability by implementing a batching mechanism to prevent deadlocks during bulk deletion operations.
Chores
- Increased cleanup task frequency from 10-minute to 1-minute intervals for faster maintenance of stale annotations.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

Replicated from ai-code-review-evaluation/grafana-coderabbit#7

… deadlocks on MySQL (#80329) * Split subquery when cleaning annotations * update comment * Raise batch size, now that we pay attention to it * Iterate in batches * Separate cancellable batch implementation to allow for multi-statement callbacks, add overload for single-statement use * Use split-out utility in outer batching loop so it respects context cancellation * guard against empty queries * Use SQL parameters * Use same approach for tags * drop unused function * Work around parameter limit on sqlite for large batches * Bulk insert test data in DB * Refactor test to customise test data creation * Add test for catching SQLITE_MAX_VARIABLE_NUMBER limit * Turn annotation cleanup test to integration tests * lint --------- Co-authored-by: Sofia Papagiannaki <[email protected]>

akshayutture-augment · 2025-11-14T23:48:46Z

@coderabbitai review

akshayutture-augment · 2025-11-22T18:05:02Z

@coderabbitai review

coderabbitai · 2025-11-22T18:05:07Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai · 2025-11-22T18:05:26Z

Walkthrough

The changes refactor annotation cleanup operations to perform batched deletions instead of single-query operations, introduce integration-style test scaffolding with real database writes, and increase cleanup task execution frequency from every 10 minutes to every 1 minute.

Changes

Cohort / File(s)	Summary
Annotation cleanup batching mechanism `pkg/services/annotations/annotationsimpl/xorm_store.go`	Replaces single-statement DELETE operations with iterative batched deletions. Adds helpers: `fetchIDs` (load IDs by condition), `deleteByIDs` (delete by ID list with SQLite parameter limit handling), `asAny` (type conversion), and `untilDoneOrCancelled` (generic batching loop). Reworks `CleanAnnotations`, `CleanAnnotations` (MaxCount variant), and `CleanOrphanedAnnotationTags` to use batched fetch-delete pattern. Includes SQLite-specific parameter limit optimizations.
Annotation cleanup test conversion `pkg/services/annotations/annotationsimpl/cleanup_test.go`	Converts unit tests to integration-style with real test database initialization. Renames `TestAnnotationCleanUp` to `TestIntegrationAnnotationCleanUp` and adds short-mode skip. Replaces pre-initialized fake SQL DB with real test DB setup. Switches from separate count variables to `createAnnotationsNum` and `createOldAnnotationsNum`. Introduces per-test `annotationCleanupJobBatchSize` parameter. Replaces single-insert flow with batched `InsertMulti` (batch size 500) for annotations and tags. Adds combined per-test cleanup via error-join deletion. Updates test expectations for annotation type and tag counts. Adds test case for batch size exceeding SQLite variable limits.
Cleanup service frequency adjustment `pkg/services/cleanup/cleanup.go`	Increases cleanup task execution frequency by reducing ticker interval from 10 minutes to 1 minute in `CleanUpService.Run`.

Sequence Diagram

sequenceDiagram
    participant Test
    participant Store as CleanAnnotations
    participant DB as Database
    
    rect rgb(240, 248, 255)
    note over Store: Old approach (single query)
    Test->>Store: CleanAnnotations(cutoff)
    Store->>DB: DELETE (single query with subquery)
    DB-->>Store: result
    Store-->>Test: affected count
    end
    
    rect rgb(255, 250, 240)
    note over Store: New approach (batched)
    Test->>Store: CleanAnnotations(cutoff)
    loop untilDoneOrCancelled
        Store->>DB: SELECT IDs WHERE created < cutoff LIMIT batchSize
        DB-->>Store: IDs
        alt IDs empty
            Store->>Store: break loop
        else IDs found
            Store->>DB: DELETE WHERE ID IN (IDs)
            DB-->>Store: affected count
            Store->>Store: accumulate affected, continue
        end
    end
    Store-->>Test: total affected count
    end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

xorm_store.go: Introduces new batching patterns with untilDoneOrCancelled, fetchIDs, and deleteByIDs helpers; SQLite-specific parameter limit handling requires careful validation
cleanup_test.go: Extensive structural changes from unit to integration tests; batched inserts and per-test cleanup patterns need verification for correctness
cleanup.go: Trivial constant change; review the rationale for 10x frequency increase (deadlock prevention impact)

Poem

🐰 A rabbit's hop through batch deletes,
No more deadlocks in our little heats—
By ten's the minute now we clean,
The smartest schema you've e'er seen! ✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 16.67% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.
Title check	❓ Inconclusive	The title 'Database Performance Optimizations' is vague and generic, covering only the superficial outcome rather than the specific changes made.	Consider a more specific title that describes the actual changes, such as 'Implement batched deletion cleanup operations' or 'Add batched cleanup with interval tuning to reduce deadlocks'.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch db-cleanup-optimized

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai · 2025-11-22T18:06:02Z

Note

Unit test generation is an Early Access feature. Expect some limitations and changes as we gather feedback and continue to improve it.

Generating unit tests... This may take up to 20 minutes.

coderabbitai · 2025-11-22T18:10:27Z

✅ UTG Post-Process Complete

No new issues were detected in the generated code and all check runs have completed. The unit test generation process has completed successfully.

coderabbitai · 2025-11-22T18:10:29Z

Creating a PR to put the unit tests in...

The changes have been created in this pull request: View PR

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (9)

pkg/services/annotations/annotationsimpl/xorm_store.go (4)
523-563: Batched cleanup loop looks correct, but logging is too heavy/noisy for normal operation

The new untilDoneOrCancelled usage for both MaxAge and MaxCount paths looks logically sound: you batch by IDs, accumulate totalAffected, and terminate cleanly on affected == 0 or error/cancellation.

However, the logging inside each batch is aggressive and potentially expensive:

Uses r.log.Error for routine, successful batches ("Annotations to clean by time/count" and "cleaned annotations by time/count").

Logs the full ids slice and the cond string every batch, which can be large and high‑volume under load.

This will generate a lot of error‑level noise and can impact performance and log storage.

Suggest:

Downgrade these to Debug (or at most Info) and drop ids from the log fields, keeping only counts and perhaps a short summary of the condition.

Optionally, log only once per cleanup run (e.g., before/after the untilDoneOrCancelled loop) instead of per batch.

584-595: fetchIDs is safe for current call sites but tightly couples condition shape to callers

fetchIDs builds SELECT id FROM %s WHERE %s with a raw condition string. Given that condition is composed from internal constants (annotationType) and fixed patterns, it’s reasonably safe here, but it means:

Callers must always remember to include any ORDER BY / LIMIT in the condition string.

The helper is only suitable for tables with an id column.

This is fine for current usage, but if you plan to reuse fetchIDs more broadly, consider:

Splitting where, orderBy, and limit into separate arguments, or

Accepting a full SQL fragment (including ORDER BY/LIMIT) and making it explicit in the name, e.g., fetchIDsWithClause.

597-630: SQLite parameter-limit handling should ideally be based on len(ids), not configured batch size

The SQLite path in deleteByIDs uses:
const sqliteParameterLimit = 999
if r.db.GetDBType() == migrator.SQLite && r.cfg.AnnotationCleanupJobBatchSize > sqliteParameterLimit {
    // inline IDs
} else {
    // use placeholders
}
This works for current callers because len(ids) is bounded by AnnotationCleanupJobBatchSize, but it couples the behavior to config rather than the actual number of IDs:

If another caller ever reuses deleteByIDs with a different batch size, len(ids) could exceed 999 even when AnnotationCleanupJobBatchSize does not, or vice versa.

To make this helper robust in isolation, it would be better to switch on len(ids):
if r.db.GetDBType() == migrator.SQLite && len(ids) > sqliteParameterLimit {
    // inline IDs
} else {
    // placeholders
}
This keeps the behavior correct regardless of how callers choose batch sizes.

632-638: asAny helper is fine; consider inlining if it stays single-use

asAny cleanly converts []int64 to []any for the Exec call and is correct as written. If it remains used only in deleteByIDs, you might consider inlining it there to keep the helper surface small, but that’s purely cosmetic.
pkg/services/cleanup/cleanup.go (1)

77-85: More frequent cleanup cadence looks OK, but consider stopping the ticker and validating load impact

Changing the ticker to run every minute will cause cleanup (including DB work) to be attempted much more frequently. Since Run calls srv.clean(ctx) synchronously, runs won’t overlap, but you may:

Do significantly more frequent annotation and snapshot cleanups.

Hit the 9‑minute per-run timeout while ticks keep accumulating in the ticker channel.

Two suggestions:

Add defer ticker.Stop() after creating the ticker to avoid leaking resources when Run exits.

Confirm via metrics/load tests that a 1‑minute cadence doesn’t cause undue DB or IO pressure compared to the previous 10‑minute interval.

pkg/services/annotations/annotationsimpl/cleanup_test.go (4)

18-37: Integration test setup is good; config usage is a bit split but coherent

Renaming to TestIntegrationAnnotationCleanUp, gating with testing.Short(), and switching to db.InitTestDB(t) makes these behave like proper integration tests. The table‑driven structure with createAnnotationsNum / createOldAnnotationsNum and explicit expected counts per annotation type matches the new batched cleanup semantics.

You’re using two configs per test (cfg for batch size via AnnotationCleanupJobBatchSize, and test.cfg for the per‑type MaxAge/MaxCount settings). As long as ProvideCleanupService reads batch size from the first and uses the second only inside Run, this is fine, but it’s worth keeping in mind to avoid confusion if more knobs are added later.

97-111: Large SQLite batch-size test is valuable but quite heavy; consider trimming if runtime becomes an issue

The "should not fail if batch size is larger than SQLITE_MAX_VARIABLE_NUMBER..." case effectively validates:

AnnotationCleanupJobBatchSize much larger than SQLite’s parameter limit (32767 vs 999).

Cleanup correctly deletes 40000 of 40003 annotations while keeping one per annotation type.

This meaningfully exercises the new deleteByIDs SQLite path. The trade‑off is that inserting and cleaning 40k+ annotations plus tags makes this test relatively expensive.

If CI time ever becomes a concern, you could:

Reduce createAnnotationsNum while still exceeding the parameter limit (e.g., ~2× or 3× 1000), and

Keep AnnotationCleanupJobBatchSize just over the limit (e.g., 1500) so the special code path is still covered.

For now, it’s acceptable, but it’s something to monitor.

114-147: Per-test cleanup via t.Cleanup is correct, but you can simplify error handling

The t.Cleanup added for each subtest:

Clears both annotation and annotation_tag tables.

Uses errors.Join to combine potential errors and asserts NoError.

This is functionally correct and keeps the shared test DB clean across subtests. If you want to simplify slightly, you could:

Short-circuit on the first error instead of joining, since a single failure will already fail the test, or

Log which DELETE failed to aid debugging.

Not required, but might make debugging a bit cleaner if cleanup ever fails.

If your Go toolchain is older, ensure errors.Join is available (Go 1.20+).

232-292: Batched insert helper looks correct; explicit IDs and tags are consistent with cleanup expectations

In createTestAnnotations:

Preallocating newAnnotations and newAnnotationTags and inserting in batches of 500 via InsertMulti is a good way to keep test setup fast, even with tens of thousands of rows.

Assigning ID: int64(i + 1) ensures deterministic ordering for the ORDER BY id DESC logic under test, and the annotation_tag rows reference those IDs consistently.

Distribution of alert/dashboard/API annotations and Created timestamps matches the table-driven expectations for MaxAge/MaxCount tests.

The overall approach is sound for an integration test fixture. The only minor consideration is that this helper assumes an empty or test-local DB; if more shared state is introduced later, reusing explicit IDs from 1..N could conflict and may need revisiting.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f84c8f6 and 81c45bf.

📒 Files selected for processing (3)

pkg/services/annotations/annotationsimpl/cleanup_test.go (9 hunks)
pkg/services/annotations/annotationsimpl/xorm_store.go (2 hunks)
pkg/services/cleanup/cleanup.go (1 hunks)

🔇 Additional comments (2)

pkg/services/annotations/annotationsimpl/xorm_store.go (2)

640-651: untilDoneOrCancelled behavior is sound and preserves partial progress on error/cancel

The loop correctly:

Checks ctx.Done() before each batch, returning the ctx.Err() together with totalAffected.

Calls batchWork() once per iteration, accumulates totalAffected, and exits when a batch returns affected == 0.

This ensures cleanup jobs report how many rows were processed even when they terminate early due to errors or timeouts, which aligns well with how CleanAnnotations and CleanOrphanedAnnotationTags are using it.

569-582: The annotation_tag schema includes an id column—the original concern is invalid

The migration file (pkg/services/sqlstore/migrations/annotation_mig.go) shows the schema evolution: v2 had only annotation_id and tag_id columns, but v3 (lines 92–102) explicitly adds id as an auto-incrementing primary key. The migration (lines 109–114) copies data from v2 to v3 and drops v2, making v3 the current canonical schema. The fetchIDs and deleteByIDs helper methods (lines 584, 597) correctly assume the presence of an id column. This cleanup functionality shipped in Grafana 7.4.0, confirming the schema migration is active in production deployments.

ai-code-review-evaluations deleted a comment from coderabbitai bot Nov 18, 2025

coderabbitai bot mentioned this pull request Nov 22, 2025

CodeRabbit Generated Unit Tests: Add comprehensive integration tests for annotation cleanup #11

Closed

coderabbitai bot reviewed Nov 22, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Database Performance Optimizations #4

Database Performance Optimizations #4

Uh oh!

akshayutture-augment commented Nov 14, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

akshayutture-augment commented Nov 14, 2025

Uh oh!

akshayutture-augment commented Nov 22, 2025

Uh oh!

coderabbitai bot commented Nov 22, 2025

Uh oh!

coderabbitai bot commented Nov 22, 2025 •

edited

Loading

Uh oh!

coderabbitai bot commented Nov 22, 2025

Uh oh!

coderabbitai bot commented Nov 22, 2025

Uh oh!

coderabbitai bot commented Nov 22, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Database Performance Optimizations #4

Are you sure you want to change the base?

Database Performance Optimizations #4

Uh oh!

Conversation

akshayutture-augment commented Nov 14, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

akshayutture-augment commented Nov 14, 2025

Uh oh!

akshayutture-augment commented Nov 22, 2025

Uh oh!

coderabbitai bot commented Nov 22, 2025

Uh oh!

coderabbitai bot commented Nov 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot commented Nov 22, 2025

Uh oh!

coderabbitai bot commented Nov 22, 2025

Uh oh!

coderabbitai bot commented Nov 22, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

akshayutture-augment commented Nov 14, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 22, 2025 •

edited

Loading