-
Notifications
You must be signed in to change notification settings - Fork 9
Database Performance Optimizations #7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: db-cleanup-baseline
Are you sure you want to change the base?
Conversation
… deadlocks on MySQL (#80329) * Split subquery when cleaning annotations * update comment * Raise batch size, now that we pay attention to it * Iterate in batches * Separate cancellable batch implementation to allow for multi-statement callbacks, add overload for single-statement use * Use split-out utility in outer batching loop so it respects context cancellation * guard against empty queries * Use SQL parameters * Use same approach for tags * drop unused function * Work around parameter limit on sqlite for large batches * Bulk insert test data in DB * Refactor test to customise test data creation * Add test for catching SQLITE_MAX_VARIABLE_NUMBER limit * Turn annotation cleanup test to integration tests * lint --------- Co-authored-by: Sofia Papagiannaki <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Summary
This PR implements database performance optimizations focused on the cleanup and annotation systems in Grafana. The changes include three main components:
Cleanup Service Frequency Adjustment: The cleanup service interval has been reduced from 10 minutes to 1 minute, causing all cleanup jobs (temp files, expired snapshots, dashboard versions, images, annotations, user invites, short URLs, and query history) to run 10x more frequently. This change maintains the existing 9-minute timeout for all cleanup operations combined.
Annotation Cleanup Refactoring: The annotation cleanup implementation has been completely restructured to address MySQL deadlock issues. The previous single-statement DELETE operations with subqueries have been replaced with a two-phase approach: first fetching IDs into memory, then deleting by those IDs. This includes new helper functions (fetchIDs, deleteByIDs, untilDoneOrCancelled) and database-specific handling for SQLite's parameter limitations.
Test Infrastructure Improvements: The annotation cleanup tests have been significantly enhanced with proper integration test patterns, optimized test data creation using batch insertions instead of individual inserts, and comprehensive testing for SQLite variable limits. Tests now use testing.Short() checks, batch operations with pre-allocated slices, and InsertMulti for performance.
These changes fit into Grafana's broader architecture by optimizing the background cleanup processes that maintain database health and prevent data accumulation. The cleanup service is a critical system component that runs continuously to manage storage usage and maintain system performance.
PR Description Notes:
- The PR body only contains "Test 7" which lacks detail about the changes and their purpose
Confidence score: 2/5
- This PR introduces significant architectural changes that could cause performance degradation and security vulnerabilities in production
- Multiple critical issues including potential SQL injection, excessive resource usage from 10x frequency increase, and inappropriate error logging levels
- Files
pkg/services/cleanup/cleanup.goandpkg/services/annotations/annotationsimpl/xorm_store.goneed careful review before merging
3 files reviewed, 3 comments
| }) | ||
| require.NoError(t, err) | ||
| newAnnotations = append(newAnnotations, a) | ||
| newAnnotationTags = append(newAnnotationTags, &annotationTag{AnnotationID: a.ID, TagID: 1}, &annotationTag{AnnotationID: a.ID, TagID: 2}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: Missing definition for annotationTag struct - ensure this type is properly imported or defined
| if err != nil { | ||
| return 0, err | ||
| } | ||
| r.log.Error("Annotations to clean by time", "count", len(ids), "ids", ids, "cond", cond, "err", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: Using Error level for normal operations will spam logs. Consider using Debug level for routine cleanup activities
| }) | ||
| require.NoError(t, err) | ||
| newAnnotations = append(newAnnotations, a) | ||
| newAnnotationTags = append(newAnnotationTags, &annotationTag{AnnotationID: a.ID, TagID: 1}, &annotationTag{AnnotationID: a.ID, TagID: 2}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: Missing definition for annotationTag struct - ensure this type is properly imported or defined
| if err != nil { | ||
| return 0, err | ||
| } | ||
| r.log.Error("Annotations to clean by time", "count", len(ids), "ids", ids, "cond", cond, "err", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: Using Error level for normal operations will spam logs. Consider using Debug level for routine cleanup activities
| if err != nil { | ||
| return 0, err | ||
| } | ||
| r.log.Error("Annotations to clean by time", "count", len(ids), "ids", ids, "cond", cond, "err", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: Using Error level for normal operations will spam logs. Consider using Debug level for routine cleanup activities
| // SQLite has a parameter limit of 999. | ||
| // If the batch size is bigger than that, and we're on SQLite, we have to put the IDs directly into the statement. | ||
| const sqliteParameterLimit = 999 | ||
| if r.db.GetDBType() == migrator.SQLite && r.cfg.AnnotationCleanupJobBatchSize > sqliteParameterLimit { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: The batch size check should also validate against the actual number of IDs to prevent unnecessary string building when len(ids) <= sqliteParameterLimit
| // SQLite has a parameter limit of 999. | ||
| // If the batch size is bigger than that, and we're on SQLite, we have to put the IDs directly into the statement. | ||
| const sqliteParameterLimit = 999 | ||
| if r.db.GetDBType() == migrator.SQLite && r.cfg.AnnotationCleanupJobBatchSize > sqliteParameterLimit { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: The batch size check should also validate against the actual number of IDs to prevent unnecessary string building when len(ids) <= sqliteParameterLimit
|
This pull request has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in 2 weeks if no further activity occurs. Please feel free to give a status update or ping for review. Thank you for your contributions! |
|
This pull request has been automatically closed because it has not had any further activity in the last 2 weeks. Thank you for your contributions! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.
Test 7