Skip to content

Conversation

@hussam789
Copy link

@hussam789 hussam789 commented Oct 30, 2025

User description

PR #6


PR Type

Enhancement, Tests


Description

  • Implement Mode 3 dual writer with async legacy storage writes

    • Primary reads/writes to unified storage with 10-second timeout for legacy writes
    • Adds comprehensive metrics recording for storage operations
  • Complete Mode 3 test coverage for all CRUD operations

    • Create, Get, List, Delete, DeleteCollection, and Update methods
  • Add Mode 3 integration tests for playlist resources

    • Tests with file, unified storage, and etcd backends
  • Improve logging with contextual method and kind information


Diagram Walkthrough

flowchart LR
  A["Request"] --> B["Storage Operation"]
  B --> C["Record Metrics"]
  C --> D["Return Result"]
  B --> E["Async Legacy Write"]
  E --> F["10s Timeout"]
  F --> G["Record Legacy Metrics"]
Loading

File Walkthrough

Relevant files
Enhancement
dualwriter_mode3.go
Mode 3 dual writer async architecture with metrics             

pkg/apiserver/rest/dualwriter_mode3.go

  • Refactored all CRUD methods to write to storage first, then async to
    legacy storage
  • Added comprehensive metrics recording for both storage and legacy
    operations
  • Implemented 10-second timeout context for legacy writes to prevent
    blocking
  • Added logging with contextual method, kind, and name information
  • Implemented List method (previously unimplemented)
  • Changed error handling to return storage errors immediately while
    logging legacy errors
+98/-60 
Tests
dualwriter_mode3_test.go
Complete Mode 3 test coverage for all operations                 

pkg/apiserver/rest/dualwriter_mode3_test.go

  • Uncommented and completely rewrote test suite with proper mock setup
  • Added test cases for Create, Get, List, Delete, DeleteCollection, and
    Update
  • Each test includes success and error scenarios
  • Uses mock.Mock for storage and legacy storage interactions
  • Tests verify correct object returns and error handling
+354/-70
playlist_test.go
Add Mode 3 playlist integration tests                                       

pkg/tests/apis/playlist/playlist_test.go

  • Added Mode 3 integration test with file storage backend
  • Added Mode 3 integration test with unified storage backend
  • Added Mode 3 integration test with etcd backend
  • Tests verify playlist operations work correctly across different
    storage configurations
+56/-0   
Miscellaneous
dualwriter_mode1_test.go
Remove redundant prometheus registry creation                       

pkg/apiserver/rest/dualwriter_mode1_test.go

  • Removed unused prometheus.NewRegistry() initialization
  • Relies on existing prometheus registry passed to NewDualWriter
+0/-1     
Dependencies
go.work.sum
Update dependencies for enhanced observability                     

go.work.sum

  • Added grafana-azure-sdk-go v2.1.0 dependency
  • Added grafana-plugin-sdk-go v0.235.0 dependency
  • Added prometheus-alertmanager v0.25.1 dependency
  • Added OpenTelemetry metric and trace exporters
+16/-0   

* Dual writer: mode 3

* Add integration tests for playlits in mode 3

* Remove todo

* Update pkg/apiserver/rest/dualwriter_mode3.go

Co-authored-by: Arati R. <[email protected]>

* Admin: Fixes an issue where user accounts could not be enabled (#88117)

Fix: unable to enable user

* [REVIEW] FInish mode 3 and add tests

* Improve logging

* Update dependencies

* Update pkg/apiserver/rest/dualwriter_mode3_test.go

Co-authored-by: maicon <[email protected]>

* remove test assertion

* Use mode log when dual writer is initiated

---------

Co-authored-by: Arati R. <[email protected]>
Co-authored-by: gonvee <[email protected]>
Co-authored-by: maicon <[email protected]>
@qodo-code-review
Copy link

PR Compliance Guide 🔍

Below is a summary of compliance checks for this PR:

Security Compliance
Context cancellation risk

Description: Asynchronous goroutine for legacy Create captures the parent context which may be canceled
early, potentially causing unintended cancellation of the legacy write; a detached
background context with explicit timeout would be safer.
dualwriter_mode3.go [50-58]

Referred Code
go func() {
	ctx, cancel := context.WithTimeoutCause(ctx, time.Second*10, errors.New("legacy create timeout"))
	defer cancel()

	startLegacy := time.Now()
	_, errObjectSt := d.Legacy.Create(ctx, obj, createValidation, options)
	d.recordLegacyDuration(errObjectSt != nil, mode3Str, options.Kind, method, startLegacy)
}()
Metrics label cardinality

Description: In Delete, storage duration metric uses 'name' for the 'kind' label (line 106), resulting
in high-cardinality metric labels that can degrade observability systems and expose object
names.
dualwriter_mode3.go [106-114]

Referred Code
d.recordStorageDuration(false, mode3Str, name, method, startStorage)

go func() {
	startLegacy := time.Now()
	ctx, cancel := context.WithTimeoutCause(ctx, time.Second*10, errors.New("legacy delete timeout"))
	defer cancel()
	_, _, err := d.Legacy.Delete(ctx, name, deleteValidation, options)
	d.recordLegacyDuration(err != nil, mode3Str, options.Kind, method, startLegacy)
}()
Incorrect metrics recording

Description: In DeleteCollection legacy async path records storage duration instead of legacy duration,
leading to misleading metrics and masking failures.
dualwriter_mode3.go [165-167]

Referred Code
	_, err := d.Legacy.DeleteCollection(ctx, deleteValidation, options, listOptions)
	d.recordStorageDuration(err != nil, mode3Str, options.Kind, method, startLegacy)
}()
Ticket Compliance
🎫 No ticket provided
  • Create ticket/issue
Codebase Duplication Compliance
Codebase context is not defined

Follow the guide to enable codebase context checks.

Custom Compliance
🟢
Generic: Secure Error Handling

Objective: To prevent the leakage of sensitive system information through error messages while
providing sufficient detail for internal debugging.

Status: Passed

🔴
Generic: Robust Error Handling and Edge Case Management

Objective: Ensure comprehensive error handling that provides meaningful context and graceful
degradation

Status:
Metrics misuse: In DeleteCollection the legacy operation duration is recorded with recordStorageDuration
instead of recordLegacyDuration, leading to misleading error metrics.

Referred Code
go func() {
	startLegacy := time.Now()
	ctx, cancel := context.WithTimeoutCause(ctx, time.Second*10, errors.New("legacy deletecollection timeout"))
	defer cancel()
	_, err := d.Legacy.DeleteCollection(ctx, deleteValidation, options, listOptions)
	d.recordStorageDuration(err != nil, mode3Str, options.Kind, method, startLegacy)
}()
Generic: Comprehensive Audit Trails

Objective: To create a detailed and reliable record of critical system actions for security analysis
and compliance.

Status:
Audit context: New logging added around CRUD operations does not include authenticated user identity or
outcome fields required for audit trails, making it unclear if audit requirements are met.

Referred Code
func (d *DualWriterMode3) Create(ctx context.Context, obj runtime.Object, createValidation rest.ValidateObjectFunc, options *metav1.CreateOptions) (runtime.Object, error) {
	var method = "create"
	log := d.Log.WithValues("kind", options.Kind, "method", method)
	ctx = klog.NewContext(ctx, log)

	startStorage := time.Now()
	created, err := d.Storage.Create(ctx, obj, createValidation, options)
	if err != nil {
		log.Error(err, "unable to create object in storage")
		d.recordLegacyDuration(true, mode3Str, options.Kind, method, startStorage)
		return created, err
	}
	d.recordStorageDuration(false, mode3Str, options.Kind, method, startStorage)

	go func() {
		ctx, cancel := context.WithTimeoutCause(ctx, time.Second*10, errors.New("legacy create timeout"))
		defer cancel()

		startLegacy := time.Now()
		_, errObjectSt := d.Legacy.Create(ctx, obj, createValidation, options)
		d.recordLegacyDuration(errObjectSt != nil, mode3Str, options.Kind, method, startLegacy)


 ... (clipped 110 lines)
Generic: Meaningful Naming and Self-Documenting Code

Objective: Ensure all identifiers clearly express their purpose and intent, making code
self-documenting

Status:
Naming clarity: Local variable name 'method' used as a string tag throughout logging is generic
and may reduce clarity compared to using constants or typed enums for operation names.

Referred Code
func (d *DualWriterMode3) Create(ctx context.Context, obj runtime.Object, createValidation rest.ValidateObjectFunc, options *metav1.CreateOptions) (runtime.Object, error) {
	var method = "create"
	log := d.Log.WithValues("kind", options.Kind, "method", method)
	ctx = klog.NewContext(ctx, log)

	startStorage := time.Now()
	created, err := d.Storage.Create(ctx, obj, createValidation, options)
	if err != nil {
		log.Error(err, "unable to create object in storage")
		d.recordLegacyDuration(true, mode3Str, options.Kind, method, startStorage)
		return created, err
	}
	d.recordStorageDuration(false, mode3Str, options.Kind, method, startStorage)

	go func() {
		ctx, cancel := context.WithTimeoutCause(ctx, time.Second*10, errors.New("legacy create timeout"))
		defer cancel()

		startLegacy := time.Now()
		_, errObjectSt := d.Legacy.Create(ctx, obj, createValidation, options)
		d.recordLegacyDuration(errObjectSt != nil, mode3Str, options.Kind, method, startLegacy)


 ... (clipped 93 lines)
Generic: Secure Logging Practices

Objective: To ensure logs are useful for debugging and auditing without exposing sensitive
information like PII, PHI, or cardholder data.

Status:
Log content: Logging adds fields 'kind', 'name', and 'resourceVersion'
which could be sensitive depending on resource types; need confirmation that these are
safe to log at the chosen level.

Referred Code
	var method = "get"
	log := d.Log.WithValues("kind", options.Kind, "name", name, "method", method)
	ctx = klog.NewContext(ctx, log)

	startStorage := time.Now()
	res, err := d.Storage.Get(ctx, name, options)
	if err != nil {
		log.Error(err, "unable to get object in storage")
	}
	d.recordStorageDuration(err != nil, mode3Str, options.Kind, method, startStorage)

	return res, err
}

// List overrides the behavior of the generic DualWriter and reads only from Unified Store.
func (d *DualWriterMode3) List(ctx context.Context, options *metainternalversion.ListOptions) (runtime.Object, error) {
	var method = "list"
	log := d.Log.WithValues("kind", options.Kind, "resourceVersion", options.ResourceVersion, "method", method)
	ctx = klog.NewContext(ctx, log)

	startStorage := time.Now()


 ... (clipped 6 lines)
Generic: Security-First Input Validation and Data Handling

Objective: Ensure all data inputs are validated, sanitized, and handled securely to prevent
vulnerabilities

Status:
Input validation: The CRUD methods rely on downstream storage to validate inputs and do not add explicit
validation or sanitization for externally provided fields like 'name' and
options, which may be acceptable but is not verifiable from this diff.

Referred Code
func (d *DualWriterMode3) Get(ctx context.Context, name string, options *metav1.GetOptions) (runtime.Object, error) {
	var method = "get"
	log := d.Log.WithValues("kind", options.Kind, "name", name, "method", method)
	ctx = klog.NewContext(ctx, log)

	startStorage := time.Now()
	res, err := d.Storage.Get(ctx, name, options)
	if err != nil {
		log.Error(err, "unable to get object in storage")
	}
	d.recordStorageDuration(err != nil, mode3Str, options.Kind, method, startStorage)

	return res, err
}

// List overrides the behavior of the generic DualWriter and reads only from Unified Store.
func (d *DualWriterMode3) List(ctx context.Context, options *metainternalversion.ListOptions) (runtime.Object, error) {
	var method = "list"
	log := d.Log.WithValues("kind", options.Kind, "resourceVersion", options.ResourceVersion, "method", method)
	ctx = klog.NewContext(ctx, log)



 ... (clipped 63 lines)
Compliance status legend 🟢 - Fully Compliant
🟡 - Partial Compliant
🔴 - Not Compliant
⚪ - Requires Further Human Verification
🏷️ - Compliance label

@qodo-code-review
Copy link

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
High-level
Asynchronous legacy writes lack observability

The asynchronous legacy storage writes do not log errors or timeouts, only
recording them as metrics. This suggestion recommends adding structured logging
within the goroutines to capture error details and improve debuggability.

Examples:

pkg/apiserver/rest/dualwriter_mode3.go [50-57]
	go func() {
		ctx, cancel := context.WithTimeoutCause(ctx, time.Second*10, errors.New("legacy create timeout"))
		defer cancel()

		startLegacy := time.Now()
		_, errObjectSt := d.Legacy.Create(ctx, obj, createValidation, options)
		d.recordLegacyDuration(errObjectSt != nil, mode3Str, options.Kind, method, startLegacy)
	}()
pkg/apiserver/rest/dualwriter_mode3.go [108-114]
	go func() {
		startLegacy := time.Now()
		ctx, cancel := context.WithTimeoutCause(ctx, time.Second*10, errors.New("legacy delete timeout"))
		defer cancel()
		_, _, err := d.Legacy.Delete(ctx, name, deleteValidation, options)
		d.recordLegacyDuration(err != nil, mode3Str, options.Kind, method, startLegacy)
	}()

Solution Walkthrough:

Before:

func (d *DualWriterMode3) Create(...) {
  // ... storage write ...
  go func() {
    ctx, cancel := context.WithTimeoutCause(ctx, time.Second*10, errors.New("legacy create timeout"))
    defer cancel()

    startLegacy := time.Now()
    _, errObjectSt := d.Legacy.Create(ctx, obj, createValidation, options)
    d.recordLegacyDuration(errObjectSt != nil, mode3Str, options.Kind, method, startLegacy)
  }()
  // ...
}

After:

func (d *DualWriterMode3) Create(...) {
  // ... storage write ...
  go func() {
    ctx, cancel := context.WithTimeoutCause(ctx, time.Second*10, errors.New("legacy create timeout"))
    defer cancel()

    startLegacy := time.Now()
    _, errObjectSt := d.Legacy.Create(ctx, obj, createValidation, options)
    d.recordLegacyDuration(errObjectSt != nil, mode3Str, options.Kind, method, startLegacy)
    if errObjectSt != nil {
        log.Error(errObjectSt, "unable to create object in legacy storage")
    }
  }()
  // ...
}
Suggestion importance[1-10]: 8

__

Why: The suggestion correctly identifies a significant observability gap where failures in asynchronous legacy writes are not logged, making debugging difficult and potentially leading to silent data divergence.

Medium
Possible issue
Record correct metric on update failure

In the Update function's error handling block, call d.recordStorageDuration
instead of d.recordLegacyDuration to correctly record the metric for a failed
storage operation.

pkg/apiserver/rest/dualwriter_mode3.go [126-131]

 	res, async, err := d.Storage.Update(ctx, name, objInfo, createValidation, updateValidation, forceAllowCreate, options)
 	if err != nil {
 		log.Error(err, "unable to update in storage")
-		d.recordLegacyDuration(true, mode3Str, options.Kind, method, startStorage)
+		d.recordStorageDuration(true, mode3Str, options.Kind, method, startStorage)
 		return res, async, err
 	}
  • Apply / Chat
Suggestion importance[1-10]: 7

__

Why: The suggestion correctly identifies a bug where the wrong metric is recorded on failure, which would lead to incorrect monitoring data for storage operations.

Medium
Record correct metric for legacy operation

In the DeleteCollection function's goroutine, call d.recordLegacyDuration
instead of d.recordStorageDuration to correctly record the metric for the legacy
delete operation.

pkg/apiserver/rest/dualwriter_mode3.go [161-167]

 	go func() {
 		startLegacy := time.Now()
 		ctx, cancel := context.WithTimeoutCause(ctx, time.Second*10, errors.New("legacy deletecollection timeout"))
 		defer cancel()
 		_, err := d.Legacy.DeleteCollection(ctx, deleteValidation, options, listOptions)
-		d.recordStorageDuration(err != nil, mode3Str, options.Kind, method, startLegacy)
+		d.recordLegacyDuration(err != nil, mode3Str, options.Kind, method, startLegacy)
 	}()
  • Apply / Chat
Suggestion importance[1-10]: 7

__

Why: The suggestion correctly identifies a bug where the wrong metric is recorded for a legacy operation, which would lead to incorrect monitoring data.

Medium
Use enriched logger for context

In the Delete function, use the enriched log variable when creating a new
context with klog.NewContext instead of the base d.Log to preserve logging
details.

pkg/apiserver/rest/dualwriter_mode3.go [94-100]

 func (d *DualWriterMode3) Delete(ctx context.Context, name string, deleteValidation rest.ValidateObjectFunc, options *metav1.DeleteOptions) (runtime.Object, bool, error) {
 	var method = "delete"
 	log := d.Log.WithValues("name", name, "kind", options.Kind, "method", method)
-	ctx = klog.NewContext(ctx, d.Log)
+	ctx = klog.NewContext(ctx, log)
 
 	startStorage := time.Now()
 	res, async, err := d.Storage.Delete(ctx, name, deleteValidation, options)
 ...

[To ensure code accuracy, apply this suggestion manually]

Suggestion importance[1-10]: 6

__

Why: The suggestion correctly identifies a logging bug where an enriched logger is created but the base logger is added to the context, causing a loss of contextual information.

Low
  • More

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants