[XPack][Watcher]: Support for Keystore Variables in Watch Definitions for Secure Secret Management

### Description

# Watcher: Support for keystore variables in watch definitions for secure secret management

### Summary

Watcher actions that call external systems like DataDog, PagerDuty, Slack, or custom webhooks currently require plaintext credentials inside the watch JSON. Even with `xpack.watcher.encrypt_sensitive_data=true`, secrets are still present in the definition that is stored in source control and passed through CI/CD.  

This proposal requests **keystore variable support inside watch definitions**, so that headers, auth blocks, and similar fields can reference secure settings, aligned with how `elasticsearch.yml` already supports secure settings.

---

### Current limitation

Elasticsearch 8.17.1 example:

```json
{
  "actions": {
    "datadog_webhook": {
      "webhook": {
        "method": "post",
        "url": "https://api.datadoghq.eu/api/v2/incidents",
        "headers": {
          "Accept": "application/json",
          "Content-Type": "application/json",
          "DD-API-KEY": "plaintext-api-key-here",
          "DD-APPLICATION-KEY": "plaintext-app-key-here"
        },
        "body": "{{#toJson}}ctx.payload{{/toJson}}"
      }
    }
  }
}
```

Attempting to use a keystore reference fails:

```json
{
  "headers": {
    "DD-API-KEY": "{{_secrets.DD_API_KEY}}"
  }
}
```

Result:

```
unknown secure setting [DD_API_KEY]
```

Notes:
- Keystore only accepts predefined secure settings today.
- There is no supported interpolation for secrets within watch definitions.

---

### Proposed solution

Enable **keystore variable interpolation** in Watcher definitions. Two compatible options are outlined below. Either would solve the problem.

**Option A. Inline interpolation using `${...}`**

```json
{
  "headers": {
    "DD-API-KEY": "${keystore.dd_api_key}",
    "DD-APPLICATION-KEY": "${keystore.dd_app_key}"
  }
}
```

**Option B. A dedicated `secrets` block resolved at runtime**

```json
{
  "actions": {
    "datadog_webhook": {
      "webhook": {
        "url": "https://api.datadoghq.eu/api/v2/incidents",
        "headers": {
          "DD-API-KEY": "@secret:dd_api_key",
          "DD-APPLICATION-KEY": "@secret:dd_app_key"
        }
      }
    }
  }
}
```

- `@secret:<name>` is resolved from the Elasticsearch keystore on the node at execution time.
- Resolution should not leak values into stored watch source or logs.

**Common expectations for both options**
- Resolution occurs only during execution.
- Values are never written back into the stored watch JSON.
- Values do not appear in logs, watch history, or error traces.
- Works anywhere a string can appear in watch definitions where secrets are typical:
  - `headers`, `auth` fields, email creds, webhook tokens.

---

### Why this matters

- Keeps credentials out of Git and code review systems.
- Avoids secret exposure in CI/CD pipelines and PR diffs.
- Aligns Watcher with existing secure settings usage patterns in `elasticsearch.yml`.
- Reduces risk and operational burden for common integrations.

---

### Use cases

- DataDog, PagerDuty, Slack webhooks.
- SMTP credentials for email actions.
- Custom webhook tokens and Basic Auth headers.
- Any third party API keys required by Watcher actions.

---

### Current workarounds and their gaps

1. **Encryption at rest**: `xpack.watcher.encrypt_sensitive_data=true`  
   - Does not prevent plaintext exposure in the definition itself.
2. **External templating**: inject secrets before deployment  
   - Adds bespoke tooling and risks accidental leakage in logs or artifacts.
3. **Environment variables in containers**  
   - Limited to specific deployments and still requires templating into the watch JSON.

---

### Security and UX considerations

- Secrets should never be returned by the Get Watch API or appear in history JSON.
- Validation should fail fast if a referenced keystore key is missing.
- Add clear docs that show supported fields and example usage.
- Ensure audit logging records that a secret reference was used without revealing the value.

---

### Environment details

- Elasticsearch version: **8.17.1**
- Deployment: Docker and GitOps style pipelines
- Watcher usage: outbound webhooks to third party systems

---

### Alternatives considered

| Approach | Pros | Cons |
| --- | --- | --- |
| Continue external templating | Simple to start | Tooling drift, risk of leakage, inconsistent across teams |
| Custom proxy that injects creds | Centralises secrets | Extra component to operate and secure |
| Dedicated Watcher secret store | Purpose built | New surface area to build and maintain |
| **Keystore interpolation in definitions** | Reuses trusted mechanism, consistent UX | Requires Watcher to resolve at runtime |

---

### Why this should be prioritised

This limitation is a **security and operational blocker** for teams that rely on Elasticsearch Watcher as part of production-grade monitoring and business alerting pipelines. Unlike Kibana alerting (which primarily handles visual or threshold-based alerts), **Watcher is far more expressive** and allows for **complex, query-driven automation** that directly operates on Elasticsearch indices.  

Without keystore variable support, the current design forces plaintext credential management, creating an unavoidable security trade-off in enterprise environments that follow GitOps or CI/CD-based watcher deployments.

#### Key reasons for prioritisation

1. **Security non-compliance risk**  
   - Credentials for Datadog, PagerDuty, and other integrations must be stored in plain JSON or templated pipelines.  
   - Violates internal security and compliance policies (e.g., ISO 27001, SOC 2) that prohibit storing secrets in source control.  
   - Even with `xpack.watcher.encrypt_sensitive_data=true`, the *definition payload* remains readable.

2. **GitOps and CI/CD adoption blocker**  
   - Modern enterprises deploy watchers through automated pipelines (Terraform, Ansible, GitLab CI).  
   - Secrets cannot be injected dynamically, as keystore values are not referenceable in the JSON definitions.  
   - This forces the use of insecure templating layers or manual editing, breaking GitOps integrity.

3. **Operational friction in collaboration**  
   - Teams cannot safely share or review watchers across environments.  
   - Pull requests or watcher audits expose API keys, creating review bottlenecks and the need for redaction policies.

4. **Feature gap with existing keystore usage**  
   - The Elasticsearch keystore already supports secure variable substitution for configuration files.  
   - Extending this pattern into watchers would ensure consistent secret handling across all parts of the stack.

5. **Watcher remains essential where Kibana alerts fall short**  
   - Kibana alerting supports simple threshold or condition-based alerts bound to UI constructs.  
   - **Watcher is required for**:
     - Multi-index and multi-query aggregation alerts.  
     - Correlation or anomaly detection logic expressed in Painless scripts.  
     - Chained conditions with conditional actions or external integrations.  
   - Teams building **business logic or revenue-impact alerts** (e.g., transaction drops, latency anomalies) depend on Watcher’s flexibility.  
   - Therefore, the inability to secure watcher definitions directly impacts production alerting reliability.

6. **Alignment with security-first ecosystem practices**  
   - All major observability and alerting systems (e.g., Prometheus, Grafana, Datadog monitors) support runtime secret injection.  
   - Elasticsearch Watcher lacks equivalent capability, creating inconsistency in multi-system integrations.

#### Summary

Adding keystore variable interpolation to Watcher definitions is **not an enhancement**, but a **critical enabler** for secure, automated, and compliant use of Elasticsearch in enterprise alerting workflows.  
It directly addresses a **security exposure**, **removes GitOps blockers**, and **preserves the advanced query capabilities** that make Watcher more powerful than Kibana alerts for data-driven operational intelligence.

---

### References

- Watcher docs: https://www.elastic.co/guide/en/elasticsearch/reference/current/watcher-api.html  
- Secure settings and keystore: https://www.elastic.co/guide/en/elasticsearch/reference/current/secure-settings.html  
- Discuss thread with context: https://discuss.elastic.co/t/watcher-support-for-keystore-variables-in-watch-definitions/

---

### Labels

`enhancement`, `security`, `area:Watcher`

---

### Request

Please consider adding keystore-backed secret interpolation in Watcher definitions, covering headers and auth fields at minimum, with strong guarantees against secret leakage in stored objects and logs. Happy to test a design or prototype and provide feedback.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[XPack][Watcher]: Support for Keystore Variables in Watch Definitions for Secure Secret Management #136001

Description

Watcher: Support for keystore variables in watch definitions for secure secret management

Summary

Current limitation

Proposed solution

Why this matters

Use cases

Current workarounds and their gaps

Security and UX considerations

Environment details

Alternatives considered

Why this should be prioritised

Key reasons for prioritisation

Summary

References

Labels

Request

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Approach	Pros	Cons
Continue external templating	Simple to start	Tooling drift, risk of leakage, inconsistent across teams
Custom proxy that injects creds	Centralises secrets	Extra component to operate and secure
Dedicated Watcher secret store	Purpose built	New surface area to build and maintain
Keystore interpolation in definitions	Reuses trusted mechanism, consistent UX	Requires Watcher to resolve at runtime

[XPack][Watcher]: Support for Keystore Variables in Watch Definitions for Secure Secret Management #136001

Description

Description

Watcher: Support for keystore variables in watch definitions for secure secret management

Summary

Current limitation

Proposed solution

Why this matters

Use cases

Current workarounds and their gaps

Security and UX considerations

Environment details

Alternatives considered

Why this should be prioritised

Key reasons for prioritisation

Summary

References

Labels

Request

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions