Improve AutoOpsAgentPolicy Status Reporting by moukoublen · Pull Request #9095 · elastic/cloud-on-k8s

moukoublen · 2026-02-03T10:50:20Z

Note: This description was generated with AI assistance (Claude Opus 4.5)

Improve AutoOpsAgentPolicy Status Reporting

Summary

Add per-resource status tracking with detailed error information for AutoOpsAgentPolicy
Introduce human-readable status fields for better observability
Enhance error reporting with specific messages for RBAC and reconciliation failures
Default columns of kubectl get autoopsagentpolicies changed to add more information.

New Status Fields

`Skipped` (int)

The number of Elasticsearch resources that are skipped from monitoring due to RBAC permission issues. When the operator is configured with --enforce-rbac-on-refs and the specified serviceAccountName lacks permission to access an Elasticsearch resource in a different namespace, that resource is counted as skipped rather than errored.

Example: skipped: 2 indicates 2 Elasticsearch clusters couldn't be monitored due to insufficient RBAC permissions.

`ReadyCount` (string)

A human-readable string showing the ratio of ready monitored resources to total monitored resources in the format Ready/Resources. This provides an at-a-glance view of the policy's health without needing to compare separate numeric fields.

Example: readyCount: "3/5" indicates 3 out of 5 matched Elasticsearch clusters have healthy AutoOps agents deployed.

`Message` (string)

A human-readable summary of the current status, combining information about ready resources, errors, and skipped resources into a single descriptive string. The message is dynamically generated based on non-zero counts.

Examples:

"3 resource ready" - all resources healthy
"2 resource ready, 1 error" - partial success with errors
"1 resource ready, 1 error, 2 skipped due to RBAC" - mixed status with RBAC issues

`Details` (map[string]ResourceStatus)

A map providing per-resource status information, keyed by resource identifier in the format namespace/name. Only resources with non-ready states (errors or skipped) are included in this map to keep the status lightweight. Each entry contains:

Phase (ResourcePhase): Either "Error" or "Skipped"
Message (string): Human-readable explanation, set for skipped resources (e.g., "RBAC access denied for service account my-sa")
Error (string): Detailed error information, set only for error states (e.g., "Failed to create AutoOps ES CA secret: secret not found")

Example:

details:
  production/es-cluster-1:
    phase: Error
    error: "Failed to create AutoOps ES API key: connection refused"
  staging/es-cluster-2:
    phase: Skipped
    message: "RBAC access denied for service account autoops-sa"

New Types

`ResourceStatus` (struct)

A lightweight struct for per-resource status information:

Phase: The resource phase (Error or Skipped)
Message: Human-readable explanation (only for non-ready states)
Error: Error details (only for error states)

`ResourcePhase` (string)

An enumeration for resource-level phases:

ErrorResourcePhase ("Error"): Resource reconciliation failed
SkippedResourcePhase ("Skipped"): Resource skipped due to RBAC

Other Changes

Renamed Method

CalculateFinalPhase() → Finalize(): Now also generates the human-readable Message and ReadyCount fields at the end of reconciliation

Enhanced Error Tracking

Replaced generic MarkResourceError() calls with specific error methods that capture the failing operation:

ResourceRBACError(es): RBAC access denied
ResourceError(es, message, err): Captures specific failures for CA secret, API key, config map, and deployment operations

`kubectl get autoopsagentpolicies`

Examples:

NAMESPACE   NAME                  READY   PHASE                        MESSAGE   AGE
e2e-venus   autoops-policy-gx76   0/1     MonitoredResourcesNotReady             3s

---

NAMESPACE   NAME                  READY   PHASE                   MESSAGE   AGE
e2e-venus   autoops-policy-gx76   0/1     AutoOpsAgentsNotReady             82s

---

NAMESPACE   NAME                  READY   PHASE   MESSAGE            AGE
e2e-venus   autoops-policy-gx76   1/1     Ready   1 resource ready   109s

---

NAMESPACE   NAME                  READY   PHASE                  MESSAGE   AGE
e2e-venus   autoops-policy-gx76   0/0     NoMonitoredResources             114s

prodsecmachine · 2026-02-03T10:50:37Z

✅ Snyk checks have passed. No issues have been found so far.

Status	Scanner	Critical	High	Medium	Low	Total (0)
✅	Open Source Security	0	0	0	0	0 issues
✅	Licenses	0	0	0	0	0 issues

💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse.

moukoublen linked an issue Feb 3, 2026 that may be closed by this pull request

Improve AutoOpsAgentPolicy Status Reporting #8965

Open

10 tasks

botelastic bot added the triage label Feb 3, 2026

moukoublen force-pushed the improve_autoops_status_reporting branch from 161bc7c to c928cdc Compare February 3, 2026 10:56

github-actions bot had a problem deploying to docs-preview February 3, 2026 10:57 Failure

moukoublen force-pushed the improve_autoops_status_reporting branch from c928cdc to 2c646f9 Compare February 3, 2026 11:04

github-actions bot had a problem deploying to docs-preview February 3, 2026 11:05 Failure

moukoublen force-pushed the improve_autoops_status_reporting branch from 2c646f9 to 20a911b Compare February 3, 2026 11:15

github-actions bot had a problem deploying to docs-preview February 3, 2026 11:15 Failure

moukoublen self-assigned this Feb 3, 2026

moukoublen added v3.4.0 (next next) >enhancement Enhancement of existing functionality labels Feb 3, 2026

botelastic bot removed the triage label Feb 3, 2026

moukoublen force-pushed the improve_autoops_status_reporting branch from 20a911b to fdefc8c Compare February 3, 2026 12:11

github-actions bot had a problem deploying to docs-preview February 3, 2026 12:11 Failure

moukoublen force-pushed the improve_autoops_status_reporting branch from fdefc8c to 62c9945 Compare February 3, 2026 12:47

github-actions bot had a problem deploying to docs-preview February 3, 2026 12:48 Failure

Improve AutoOpsAgentPolicy Status Reporting

715dcb1

moukoublen force-pushed the improve_autoops_status_reporting branch from 62c9945 to 715dcb1 Compare February 3, 2026 12:55

github-actions bot had a problem deploying to docs-preview February 3, 2026 12:55 Failure

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve AutoOpsAgentPolicy Status Reporting#9095

Improve AutoOpsAgentPolicy Status Reporting#9095
moukoublen wants to merge 1 commit intoelastic:mainfrom
moukoublen:improve_autoops_status_reporting

moukoublen commented Feb 3, 2026 •

edited

Loading

Uh oh!

prodsecmachine commented Feb 3, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

moukoublen commented Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Improve AutoOpsAgentPolicy Status Reporting

Summary

New Status Fields

Skipped (int)

ReadyCount (string)

Message (string)

Details (map[string]ResourceStatus)

New Types

ResourceStatus (struct)

ResourcePhase (string)

Other Changes

Renamed Method

Enhanced Error Tracking

kubectl get autoopsagentpolicies

Uh oh!

prodsecmachine commented Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Snyk checks have passed. No issues have been found so far.

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

moukoublen commented Feb 3, 2026 •

edited

Loading

`Skipped` (int)

`ReadyCount` (string)

`Message` (string)

`Details` (map[string]ResourceStatus)

`ResourceStatus` (struct)

`ResourcePhase` (string)

`kubectl get autoopsagentpolicies`

prodsecmachine commented Feb 3, 2026 •

edited

Loading