Skip to content

OpenSearch controller AutoTuneOptions MaintenanceSchedules desired state bug #2590

@ShacharMalachi

Description

@ShacharMalachi

Describe the bug

TL;DR: AutoTuneOptions.MaintenanceSchedules desired state is being set back to the initial configuration by the controller after every OpenSearch configuration change, and as a consequence the controller starts to print an infinite loop of logs because the actual state is not equal to the desired state.

Background

  1. In AutoTuneOptions, we initially define maintenanceSchedules with "startAt" date and it works fine.
  2. After the initial configuration, we update maintenanceSchedules with empty array or null to match the actual state that is being returned from "opensearch describe-domain" command (because we don't want to configure a new "startAt" date every time, and the maintenanceSchedules is relevant only for the initial configuration anyway).

Bug:

The bug occurs after maintenanceSchedules is already successfully set to empty array or null after the initial configuration.

When I update any other OpenSearch configuration, the controller restores the initial autoTuneOptions.maintenanceSchedules configuration (with the initial "startAt"), and then the controller starts to print an infinite loop of this logs over and over again:

controller {"level":"info","ts":"2025-08-05T07:17:05.510Z","logger":"ackrt","msg":"desired resource state has changed","kind":"Domain","namespace":"ack-controllers","name":"opensearch-vpc-domain","account":"my-account","role":"","region":"eu-central-1","is_adopted":false,"generation":76,"diff":[{"Path":{"Parts":["Spec","AutoTuneOptions","MaintenanceSchedules"]},"A":[{"cronExpressionForRecurrence":"cron(0 0 ? * 1 *)","duration":{"unit":"HOURS","value":2},"startAt":"2025-07-29T10:38:44Z"}],"B":null}]}

The logs stop only when I manually apply maintenanceSchedules with null/empty array again, but for every OpenSearch configuration change it happens again and again.

Steps to reproduce

  1. Apply OpenSearch domain with AutoTuneOptions with a valid maintenanceSchedules.
  2. Apply empty/null maintenanceSchedules after the initial configuration and wait for completion.
  3. Apply other OpenSearch configuration change.
    =>
    Result: OpenSearch controller restores the initial AutoTuneOptions.MaintenanceSchedules, and prints an infinite loop of logs because the actual state of AutoTuneOptions.MaintenanceSchedules is not equal to the desired state.

The loop stops after applying empty maintenanceSchedules manually, but whenever a new OpenSearch configuration change is applied, the bug starts again.

Partial example:
Before:

  advancedSecurityOptions:
    anonymousAuthEnabled: false
    enabled: true
    internalUserDatabaseEnabled: true
    sAMLOptions:
      enabled: false
  autoTuneOptions:
    desiredState: ENABLED
    maintenanceSchedules: []
    useOffPeakWindow: false
status:
  conditions:
  - lastTransitionTime: "2025-08-05T08:52:26Z"
    status: "True"
    type: ACK.ResourceSynced

After changing only internalUserDatabaseEnabled to false (it's the same for any other configuration change):

  advancedSecurityOptions:
    anonymousAuthEnabled: false
    enabled: true
    internalUserDatabaseEnabled: false
    sAMLOptions:
      enabled: false
  autoTuneOptions:
    desiredState: ENABLED
    maintenanceSchedules:
    - cronExpressionForRecurrence: cron(0 0 ? * 1 *)
      duration:
        unit: HOURS
        value: 2
      startAt: "2025-07-29T10:38:44Z"
    useOffPeakWindow: false
status:
  conditions:
  - lastTransitionTime: "2025-08-05T08:54:24Z"
    status: "False"
    type: ACK.ResourceSynced

You see that startAt is already an outdated value, and it wasn't me who updated this value (if it was me I would have gotten an error because the startAt is outdated, but there's no such error in this case).

But the actual state (through opensearch describe-domain command) looks like this after the initial configuration:

"AutoTuneOptions": {
    "State": "ENABLED",
    "UseOffPeakWindow": false
}

Expected outcome

After applying an empty maintenanceSchedules, I expect the maintenanceSchedules will not be changed by the controller.

Environment

  • Kubernetes version v1.30.14
  • Using EKS (yes/no), if so version? no
  • AWS service targeted (S3, RDS, etc.): OpenSearch
  • OpenSearch Controller version: 1.0.15 (latest)

Metadata

Metadata

Assignees

Labels

needs-investigationIndicates an issue needs some investigation.service/opensearchserviceIndicates issues or PRs that are related to opensearchservice-controller.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions