-
Notifications
You must be signed in to change notification settings - Fork 9
Dual Storage Architecture #6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dual-storage-baseline
Are you sure you want to change the base?
Conversation
* Dual writer: mode 3 * Add integration tests for playlits in mode 3 * Remove todo * Update pkg/apiserver/rest/dualwriter_mode3.go Co-authored-by: Arati R. <[email protected]> * Admin: Fixes an issue where user accounts could not be enabled (#88117) Fix: unable to enable user * [REVIEW] FInish mode 3 and add tests * Improve logging * Update dependencies * Update pkg/apiserver/rest/dualwriter_mode3_test.go Co-authored-by: maicon <[email protected]> * remove test assertion * Use mode log when dual writer is initiated --------- Co-authored-by: Arati R. <[email protected]> Co-authored-by: gonvee <[email protected]> Co-authored-by: maicon <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Summary
This PR implements Mode 3 of the dual storage architecture in Grafana's playlist API. Mode 3 represents a critical migration phase where writes go to both legacy and new storage systems, but reads come exclusively from the new storage system - typically the final stage before complete cutover.
The changes add comprehensive test coverage for Mode 3 across all three supported storage backends (file, unified storage, and etcd) in the playlist test suite. Each test follows a consistent pattern: configuring GrafanaOpts with the appropriate storage type, enabling required feature toggles, setting the DualWriterDesiredModes map to Mode3, and running the standard test suite.
The core Mode 3 implementation in dualwriter_mode3.go has been significantly refactored to include proper metrics recording, consistent logging, and asynchronous operations for legacy storage writes. Legacy operations now run in goroutines with timeouts to prevent blocking the primary unified storage operations, which is essential for maintaining performance during migration.
Additional changes include standardizing prometheus registry usage across tests by moving from local to global registry instances, and updating Go workspace dependencies to include new Grafana Azure SDK, plugin SDK, Prometheus Alertmanager, and OpenTelemetry components required for the dual storage functionality.
Confidence score: 2/5
• This PR has critical implementation issues that will cause compilation failures and incorrect behavior
• The Mode 3 test file contains undefined variables and mock objects, the metrics recording has parameter inconsistencies, and there are missing context propagations in goroutines
• Files needing attention: pkg/apiserver/rest/dualwriter_mode3_test.go (undefined variables), pkg/apiserver/rest/dualwriter_mode3.go (metrics bugs and missing context)
5 files reviewed, 9 comments
| created, err := d.Storage.Create(ctx, obj, createValidation, options) | ||
| if err != nil { | ||
| log.Error(err, "unable to create object in storage") | ||
| d.recordLegacyDuration(true, mode3Str, options.Kind, method, startStorage) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: Bug: calling recordLegacyDuration when storage operation fails should be recordStorageDuration
| d.recordLegacyDuration(true, mode3Str, options.Kind, method, startStorage) | |
| d.recordStorageDuration(true, mode3Str, options.Kind, method, startStorage) |
| d.recordStorageDuration(true, mode3Str, options.Kind, method, startStorage) | ||
| return res, async, err | ||
| } | ||
| d.recordStorageDuration(false, mode3Str, name, method, startStorage) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: Inconsistency: using name instead of options.Kind for metrics recording differs from other methods
| d.recordStorageDuration(false, mode3Str, name, method, startStorage) | |
| d.recordStorageDuration(false, mode3Str, options.Kind, method, startStorage) |
| upstream: objInfo, | ||
| updated: updated, | ||
| log.Error(err, "unable to update in storage") | ||
| d.recordLegacyDuration(true, mode3Str, options.Kind, method, startStorage) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: Bug: calling recordLegacyDuration when storage operation fails should be recordStorageDuration
| d.recordLegacyDuration(true, mode3Str, options.Kind, method, startStorage) | |
| d.recordStorageDuration(true, mode3Str, options.Kind, method, startStorage) |
| func (d *DualWriterMode3) Delete(ctx context.Context, name string, deleteValidation rest.ValidateObjectFunc, options *metav1.DeleteOptions) (runtime.Object, bool, error) { | ||
| var method = "delete" | ||
| log := d.Log.WithValues("name", name, "kind", options.Kind, "method", method) | ||
| ctx = klog.NewContext(ctx, d.Log) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: Context issue: passing d.Log instead of the enriched log variable breaks logging consistency
| ctx = klog.NewContext(ctx, d.Log) | |
| ctx = klog.NewContext(ctx, log) |
| ctx, cancel := context.WithTimeoutCause(ctx, time.Second*10, errors.New("legacy deletecollection timeout")) | ||
| defer cancel() | ||
| _, err := d.Legacy.DeleteCollection(ctx, deleteValidation, options, listOptions) | ||
| d.recordStorageDuration(err != nil, mode3Str, options.Kind, method, startLegacy) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: Bug: calling recordStorageDuration in legacy operation goroutine should be recordLegacyDuration
| d.recordStorageDuration(err != nil, mode3Str, options.Kind, method, startLegacy) | |
| d.recordLegacyDuration(err != nil, mode3Str, options.Kind, method, startLegacy) |
| tt.setupStorageFn(m) | ||
| } | ||
|
|
||
| dw := NewDualWriter(Mode3, ls, us, p) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: Variable p is undefined. Should be prometheus.NewRegistry() or moved to shared test setup
| dw := NewDualWriter(Mode3, ls, us, p) | |
| p := prometheus.NewRegistry() | |
| dw := NewDualWriter(Mode3, ls, us, p) |
| ls := legacyStoreMock{m, l} | ||
| us := storageMock{m, s} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: Mock types legacyStoreMock and storageMock are undefined - need to be defined or imported
|
|
||
| dw := NewDualWriter(Mode3, ls, us, p) | ||
|
|
||
| obj, _, err := dw.Update(context.Background(), tt.input, updatedObjInfoObj{}, func(ctx context.Context, obj runtime.Object) error { return nil }, func(ctx context.Context, obj, old runtime.Object) error { return nil }, false, &metav1.UpdateOptions{}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: Type updatedObjInfoObj is undefined - should be defined or use existing mock
| input: failingObj, | ||
| setupLegacyFn: func(m *mock.Mock, input runtime.Object) { | ||
| m.On("Create", mock.Anything, failingObj, mock.Anything, mock.Anything).Return(nil, errors.New("error")) | ||
| }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: Missing setupStorageFn call in error test case - unified storage should also be mocked
|
This pull request has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in 2 weeks if no further activity occurs. Please feel free to give a status update or ping for review. Thank you for your contributions! |
|
This pull request has been automatically closed because it has not had any further activity in the last 2 weeks. Thank you for your contributions! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.
Test 6