Skip to content

Conversation

@lindnerby
Copy link
Member

Description

Changes proposed in this pull request:

  • ...
  • ...
  • ...

Related issue(s)

@lindnerby lindnerby requested a review from a team as a code owner October 16, 2025 10:18
@lindnerby lindnerby added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 16, 2025
@lindnerby lindnerby changed the title Split kyma controller refactor: Split kyma controller Oct 16, 2025
@lindnerby lindnerby linked an issue Oct 21, 2025 that may be closed by this pull request
3 tasks
@c-pius
Copy link
Contributor

c-pius commented Nov 7, 2025

E2E module-without-default-cr

  • general, I think this test is obsolete and should be removed as of this decision: All Modules MUST Provide Default Module CR Data Except Mandatory Modules community#982
    • still trying to use it to find the general issue of why deletion reconciler is not triggered
    • when trying to prepare this test locally, we see issues that default CR data cannot be empty
      moduletemplates kubectl apply -f template-operator.yaml -f template-operator-0.0.1-catalog-meta.yaml
      modulereleasemeta.operator.kyma-project.io/template-operator created
      The ModuleTemplate "template-operator-0.0.1-catalog-meta" is invalid: 
      * spec.data.apiVersion: Required value: must not
      
    • looking into the logs, we can see that actually a MT with default CR data is deployed... Something is wrong in the test setup => this was wrong, we actually create a MT without default CR further below... Above error is probably because we need to exclude the attribute entirely and not provide an empty object as I did.
      c93c-478e-944f-286c8b37146c" />
  • test stales between those two steps: https://github.com/kyma-project/lifecycle-manager/blob/main/tests/e2e/no_default_cr_test.go#L96-L112
    • Kyma is expected to transition to Error state
    • test logs show Kyma.Status.State: Processing, Kyma.Status.Modules[].State: Deleting
      kyma (kyma-sample-ynumpwax) in cluster: Spec: {Channel:regular SkipMaintenanceWindows:false Modules:[]}, Status: {LastOperation:{Operation:waiting for all modules to become ready LastUpdateTime:2025-11-07 14:15:05 +0000 UTC} State:Processing Conditions:[{Type:Modules Status:False ObservedGeneration:1 LastTransitionTime:2025-11-07 14:15:05 +0000 UTC Reason:Ready Message:not all modules are in ready state} {Type:ModuleCatalog Status:True ObservedGeneration:1 LastTransitionTime:2025-11-07 14:15:05 +0000 UTC Reason:Ready Message:module templates are synchronized} {Type:SKRWebhook Status:True ObservedGeneration:1 LastTransitionTime:2025-11-07 14:15:05 +0000 UTC Reason:Ready Message:skrwebhook is synchronized}] Modules:[{Name:template-operator FQDN:kyma-project.io/module/template-operator Channel:regular Version:1.0.0-no-default-cr Message: State:Deleting Manifest:&TypeMeta{Kind:Manifest,APIVersion:operator.kyma-project.io/v1beta2,} Resource:<nil> Template:&TypeMeta{Kind:ModuleTemplate,APIVersion:operator.kyma-project.io/v1beta2,} Maintenance:false}] ActiveChannel:regular}
      
    • we see error logs from purge reconciler related to the failed connecton (expected)
    • we don't see logs from deletion controller
    • we would expect that this code is entered, which sets Kyma to ready state, but it seemingly doesn't: https://github.com/lindnerby/lifecycle-manager/blob/a7128c59616fe560fc2f1d5d5279f3caca360931/internal/controller/kyma/deletion_controller.go#L108-L111
      • => needs to be found out why
      • log Kyma deletion reconciliation started not found (we see other DEBUG level logs)
        • => looks like the deletion reconciler is not even triggered
          • func (r *DeletionReconciler) SetupWithManager(mgr ctrl.Manager, opts ctrlruntime.Options) error { is unused
            • probably got removed when merging the conflicts earlier
            • => re-add Setup DeletionReconciler 59f15bb
              • still can't see any logs from deletion controller
              • still stuck between the same test steps
              • kyma and module status is Ready now

@lindnerby
Copy link
Member Author

* => needs to be found out why
* log `Kyma deletion reconciliation started` not found (we see other DEBUG level logs)
  
  * => looks like the deletion reconciler is not even triggered
    
    * `func (r *DeletionReconciler) SetupWithManager(mgr ctrl.Manager, opts ctrlruntime.Options) error {` is unused
      
      * probably got removed when merging the conflicts earlier
      * => re-add Setup DeletionReconciler [59f15bb](https://github.com/kyma-project/lifecycle-manager/commit/59f15bbf2c33c21bb3d865055fea184f8dc5ef0f)

Was added to unreachable code location. Moved it out of the err branch.

@c-pius
Copy link
Contributor

c-pius commented Nov 11, 2025

watcher-enqueue

@c-pius
Copy link
Contributor

c-pius commented Nov 17, 2025

Proposal: Deletion Controller with Use Cases

  • we introduce a list of ordered use cases similar to the mandatory module deletion controller:
    // HandleDeletion processes the deletion of a ModuleReleaseMeta through a series of ordered use cases.
    // Returns deletion.ErrMrmNotInDeletingState error if the MRM is not in deleting state,
    // which indicates that the controller should not requeue.
    func (s *Service) HandleDeletion(ctx context.Context, mrm *v1beta2.ModuleReleaseMeta) error {
    // Find the first applicable step and execute it
    for _, step := range s.orderedSteps {
    isApplicable, err := step.IsApplicable(ctx, mrm)
    if err != nil {
    return err
    }
    if isApplicable {
    return step.Execute(ctx, mrm)
    }
    }
    return nil
    }
  • per reconciliation, exactly one use case will be performed
    • for kyma deletion controller, we expect we can cope with the performance penalty
    • determination of what use case to execute works on cached objects in KCP
    • determination of what use case to execute for SKR items requires API calls to the SKR, but that should be handlable by the individual API servers of the SKR
  • use cases return a structured result (useCase, error)
    • depending on the returned result, the controller determines
      • requeue behavior
      • metric written
      • event raised
        • we raise a "Normal" event for every successfully executed use case
        • we raise a "Warning" event for every use case where an error was returned
        • => the KymaCR can stay in Deleting state once deletion was triggered, and the event history for the object reveals the deletion progress and potential errors we are stuck with

The overview of:

deletion-new
  • note:
    • red use cases are probably not necessary (as opposed to in mandatory module deletion service), as here we can make it directly part of the reconciler setup
    • the kubeconfig secret exists predicate probably needs to be encoded in each SKR use cases predicate
    • we need to see if use cases like Delete SKR Kyma loop until the SKR Kyma is actually gone, or if we just consider it as "trigger deletion" where an eventual last step is verifying if everything is gone before removing finalizers
Use case brainstorming
  • Ensure Finalizer
    • condition: "deletion-finalizers" not in Kyma.metadata.finalizers
    • task: SSA finalizer onto Kyma
    • requeue: immediate?
  • Skip Non-Deleting
    • condition: deletionTimestampe != nil
    • task: no-op
    • requeue: no
  • Skip Reconciliation
    • condition: "skip-reconciliation" in Kyma.metadata.finalizers
    • task: no-op
    • requeue: no
  • Set Kyma Status Deleting
    • condition: Kyma.Status.State != Deleting"
    • task: SSA Kyma.Status.State = Deleting
    • requeue: immediate?
  • [Delete SKR Resources]
    • condition needs to be added to all the following tasks: Kubeconfig secret exists
    • tasks:
      • Set SKR Kyma Status Deleting
        • condition: if Kyma exists && Kyma.Status.State != Deleting
        • task: SSA Kyma.Status.State = Deleting
        • requeue: immediate
      • Delete SKR Kyma:
        • condition: if Kyma exists && Kyma.deletionTimestamp == nil
        • task: delete Kyma
        • requeue: immediate
      • Delete SKR Webhook
        • condition: if any of the webhook resources exists
        • task: delete ALL SKR webhook resources
        • requeue: immediate
      • Delete SKR MT and MRM
        • condition: if any synced MT or any synced MRM exists
        • task: delete all synced MTs and MRMs
        • requeue: immediate
      • Delete SKR CRDs
        • condition: if any of the sycned CRDs exists
        • task: delete synced CRDs
        • reqeue: immediate
      • Remove SKR Kyma finalizers
        • condition: if Kyma exists and "deletion-finalizers" not in Kyma.metadata.finalizers
        • task: remove "deletion-finalizer" from Kyma.metadata.finalizers
        • requeue: immediate
  • Delete SKR Webhook Cert and Secret on KCP
    • condition: if cert exists || secret exists
    • task: delete cert and secret
    • requeue: immediate
  • Delete Manifests
    • condition: if manifests for this Kyma exist
    • task: delete manifests
    • requeue: immediate
  • Cleanup Metrics
    • ???
  • Remove Kyma finalizers
    • condition: if Kyma exists && "deletion-finalizer" in Kyma.finalizers
    • task: remove Kyma.finalizers
    • requeue: no

@lindnerby
Copy link
Member Author

Closing this in favor for a more pragmatic approach of firstly extracting the deletion usecase for Kyma reconciliation into a designated service. Due to feasibility concerns and unforeseeable side-effects for reconciling the same resource in different control loops, while also not finding any good examples for operators doing that, we decide for a service level split.

@c-pius
Copy link
Contributor

c-pius commented Nov 28, 2025

Update: "Delete SKR MT and MRM Metadata" + "Delete SKR CRDs" are three individual use cases now: #2896

The "Remove SKR Kyma finalizers" one is obsolete. We don't have any finalizers on the SKR and also the ordering above doesn't make sense. Let's also re-iterate once the current behavior is established how to improve it in general. E.g., I think we should introduce a finalizer on the Kyma and first delete all modules.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants