Skip to content

Conversation

@AlinsRan
Copy link
Contributor

@AlinsRan AlinsRan commented Jul 23, 2025

Type of change:

  • Bugfix
  • New feature provided
  • Improve performance
  • Backport patches
  • Documentation
  • Refactor
  • Chore
  • CI/CD or Tests

What this PR does / why we need it:

background

The controller watches CRDs and incrementally builds an in-memory cache of dataplane resources. It periodically performs a full (asynchronous) sync from this cache to the dataplane, without waiting for the cache to be fully populated.

On startup, if the number of CRDs is large, the first full sync may happen before all resources are processed, resulting in partial configuration being pushed to the dataplane. This can cause temporary 404 errors and traffic disruption.

Reproduction Steps:

  1. Apply 5,000 ApisixRoute to the kubernetes cluster.

  2. Restart the Ingress controller.

  3. While the Ingress controller is starting, attempt to access all routes via the data plane. Some routes will return 404 Not Found errors.

Pre-submission checklist:

  • Did you explain what problem does this PR solve? Or what new features have been added?
  • Have you added corresponding test cases?
  • Have you modified the corresponding document?
  • Is this PR backward compatible? If it is not backward compatible, please discuss on the mailing list first

@AlinsRan AlinsRan marked this pull request as ready for review July 23, 2025 06:25
@AlinsRan AlinsRan requested review from Copilot and ronething July 23, 2025 07:15
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes a bug where full sync during controller restart could result in data plane traffic loss. The issue occurred when the controller performed full sync before all CRD resources were processed, potentially sending incomplete configuration to the data plane.

  • Implements a readiness manager to track when all relevant resources have been reconciled at least once
  • Integrates readiness tracking across all controllers to mark resources as processed
  • Delays full sync operations until initial reconciliation is complete

Reviewed Changes

Copilot reviewed 18 out of 18 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
internal/manager/readiness/manager.go New readiness manager implementation to track resource processing completion
internal/manager/run.go Integration of readiness manager with provider initialization and controller setup
internal/manager/controllers.go Registration of readiness GVKs and integration with all controllers
internal/provider/adc/adc.go Modified to wait for readiness before starting sync operations
internal/provider/provider.go Added NeedLeaderElection interface method
internal/types/k8s.go Added GvkOf function and renamed v1 import to netv1 to avoid conflicts
internal/controller/indexer/indexer.go Added ControllerName index and reordered indexer setup
test/e2e/gatewayapi/httproute.go Added E2E test for HTTPRoute sync during startup
test/e2e/crds/consumer.go Added E2E test for Consumer sync during startup
test/e2e/apisix/route.go Added E2E test for ApisixRoute sync during startup
Comments suppressed due to low confidence (2)

test/e2e/gatewayapi/httproute.go:945

  • The test description mentions 'ApisixRoute' but this is testing HTTPRoute. The description should be 'Should sync HTTPRoute during startup' to match the actual test behavior.
			Eventually(func() string {

test/e2e/gatewayapi/httproute.go:946

  • The test step description mentions 'ApisixRoute' but this is testing HTTPRoute. The description should be 'apply HTTPRoute' to match the actual resource being applied.
				_, err := s.GetResourceYaml("HTTPRoutePolicy", "http-route-policy-0")

@AlinsRan AlinsRan marked this pull request as draft July 23, 2025 07:37
@AlinsRan AlinsRan marked this pull request as ready for review July 23, 2025 08:09
// Reconcile FIXME: implement the reconcile logic (For now, it dose nothing other than directly accepting)
func (r *ApisixConsumerReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
r.Log.Info("reconcile", "request", req.NamespacedName)
r.Readier.Done(&apiv2.ApisixConsumer{}, req.NamespacedName)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be more reasonable to execute the packaging within defer func

Copy link
Contributor

@ronething ronething left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

others lgtm.

@AlinsRan AlinsRan merged commit 66c2b0a into apache:master Jul 25, 2025
23 checks passed
@AlinsRan AlinsRan deleted the fix/ingress-restart branch July 25, 2025 01:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants