Skip to content

Conversation

@everettbu
Copy link

Test 7

… deadlocks on MySQL (#80329)

* Split subquery when cleaning annotations

* update comment

* Raise batch size, now that we pay attention to it

* Iterate in batches

* Separate cancellable batch implementation to allow for multi-statement callbacks, add overload for single-statement use

* Use split-out utility in outer batching loop so it respects context cancellation

* guard against empty queries

* Use SQL parameters

* Use same approach for tags

* drop unused function

* Work around parameter limit on sqlite for large batches

* Bulk insert test data in DB

* Refactor test to customise test data creation

* Add test for catching SQLITE_MAX_VARIABLE_NUMBER limit

* Turn annotation cleanup test to integration tests

* lint

---------

Co-authored-by: Sofia Papagiannaki <[email protected]>
Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Summary

This PR implements database performance optimizations focused on the cleanup and annotation systems in Grafana. The changes include three main components:

Cleanup Service Frequency Adjustment: The cleanup service interval has been reduced from 10 minutes to 1 minute, causing all cleanup jobs (temp files, expired snapshots, dashboard versions, images, annotations, user invites, short URLs, and query history) to run 10x more frequently. This change maintains the existing 9-minute timeout for all cleanup operations combined.

Annotation Cleanup Refactoring: The annotation cleanup implementation has been completely restructured to address MySQL deadlock issues. The previous single-statement DELETE operations with subqueries have been replaced with a two-phase approach: first fetching IDs into memory, then deleting by those IDs. This includes new helper functions (fetchIDs, deleteByIDs, untilDoneOrCancelled) and database-specific handling for SQLite's parameter limitations.

Test Infrastructure Improvements: The annotation cleanup tests have been significantly enhanced with proper integration test patterns, optimized test data creation using batch insertions instead of individual inserts, and comprehensive testing for SQLite variable limits. Tests now use testing.Short() checks, batch operations with pre-allocated slices, and InsertMulti for performance.

These changes fit into Grafana's broader architecture by optimizing the background cleanup processes that maintain database health and prevent data accumulation. The cleanup service is a critical system component that runs continuously to manage storage usage and maintain system performance.

PR Description Notes:

  • The PR body only contains "Test 7" which lacks detail about the changes and their purpose

Confidence score: 2/5

  • This PR introduces significant architectural changes that could cause performance degradation and security vulnerabilities in production
  • Multiple critical issues including potential SQL injection, excessive resource usage from 10x frequency increase, and inappropriate error logging levels
  • Files pkg/services/cleanup/cleanup.go and pkg/services/annotations/annotationsimpl/xorm_store.go need careful review before merging

3 files reviewed, 3 comments

Edit Code Review Bot Settings | Greptile

})
require.NoError(t, err)
newAnnotations = append(newAnnotations, a)
newAnnotationTags = append(newAnnotationTags, &annotationTag{AnnotationID: a.ID, TagID: 1}, &annotationTag{AnnotationID: a.ID, TagID: 2})
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: Missing definition for annotationTag struct - ensure this type is properly imported or defined

if err != nil {
return 0, err
}
r.log.Error("Annotations to clean by time", "count", len(ids), "ids", ids, "cond", cond, "err", err)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: Using Error level for normal operations will spam logs. Consider using Debug level for routine cleanup activities

})
require.NoError(t, err)
newAnnotations = append(newAnnotations, a)
newAnnotationTags = append(newAnnotationTags, &annotationTag{AnnotationID: a.ID, TagID: 1}, &annotationTag{AnnotationID: a.ID, TagID: 2})
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: Missing definition for annotationTag struct - ensure this type is properly imported or defined

if err != nil {
return 0, err
}
r.log.Error("Annotations to clean by time", "count", len(ids), "ids", ids, "cond", cond, "err", err)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: Using Error level for normal operations will spam logs. Consider using Debug level for routine cleanup activities

if err != nil {
return 0, err
}
r.log.Error("Annotations to clean by time", "count", len(ids), "ids", ids, "cond", cond, "err", err)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: Using Error level for normal operations will spam logs. Consider using Debug level for routine cleanup activities

// SQLite has a parameter limit of 999.
// If the batch size is bigger than that, and we're on SQLite, we have to put the IDs directly into the statement.
const sqliteParameterLimit = 999
if r.db.GetDBType() == migrator.SQLite && r.cfg.AnnotationCleanupJobBatchSize > sqliteParameterLimit {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: The batch size check should also validate against the actual number of IDs to prevent unnecessary string building when len(ids) <= sqliteParameterLimit

// SQLite has a parameter limit of 999.
// If the batch size is bigger than that, and we're on SQLite, we have to put the IDs directly into the statement.
const sqliteParameterLimit = 999
if r.db.GetDBType() == migrator.SQLite && r.cfg.AnnotationCleanupJobBatchSize > sqliteParameterLimit {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: The batch size check should also validate against the actual number of IDs to prevent unnecessary string building when len(ids) <= sqliteParameterLimit

@github-actions
Copy link
Contributor

This pull request has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in 2 weeks if no further activity occurs. Please feel free to give a status update or ping for review. Thank you for your contributions!

@github-actions
Copy link
Contributor

This pull request has been automatically closed because it has not had any further activity in the last 2 weeks. Thank you for your contributions!

@github-actions github-actions bot closed this Sep 12, 2025
@everettbu everettbu reopened this Sep 12, 2025
Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants