Skip to content
Merged
Show file tree
Hide file tree
Changes from 14 commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
8e75f92
add anonymization
neptunian Jul 18, 2025
7e9db36
update
neptunian Jul 18, 2025
e6f35d3
update
neptunian Jul 18, 2025
d361613
update what is anonymized
neptunian Jul 18, 2025
ae526f4
add link
neptunian Jul 18, 2025
051abc3
update limitations
neptunian Jul 18, 2025
e5bb4e2
remove tool call
neptunian Jul 18, 2025
0cee54e
Update solutions/observability/observability-ai-assistant.md
neptunian Jul 18, 2025
9963233
add json malformed limitation and performance guidance
neptunian Jul 22, 2025
23c2771
adjust limitation wording
neptunian Jul 22, 2025
ba16779
remove sha-1
neptunian Jul 22, 2025
7c31f8d
update order of execution
neptunian Jul 22, 2025
b35ed48
remove target
neptunian Jul 22, 2025
c9cdec8
add example screenshot
neptunian Jul 22, 2025
d7786c9
Update solutions/observability/observability-ai-assistant.md
neptunian Jul 23, 2025
f6dccab
Update solutions/observability/observability-ai-assistant.md
neptunian Jul 23, 2025
a2ef057
Update solutions/observability/observability-ai-assistant.md
neptunian Jul 23, 2025
450ab6b
Update solutions/observability/observability-ai-assistant.md
neptunian Jul 23, 2025
782ebc3
Update solutions/observability/observability-ai-assistant.md
neptunian Jul 23, 2025
67bbde5
Update solutions/observability/observability-ai-assistant.md
neptunian Jul 23, 2025
8f15826
Update solutions/observability/observability-ai-assistant.md
neptunian Jul 23, 2025
6c9815b
Update solutions/observability/observability-ai-assistant.md
neptunian Jul 23, 2025
0e229be
fix email text
neptunian Jul 23, 2025
be80e8c
move example
neptunian Jul 23, 2025
dc760b3
Merge branch 'main' into ai-assistant-anonymization
neptunian Jul 23, 2025
8faf260
Merge branch 'main' into ai-assistant-anonymization
mdbirnstiehl Jul 24, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
80 changes: 79 additions & 1 deletion solutions/observability/observability-ai-assistant.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@ It's important to understand how your data is handled when using the AI Assistan
: Elastic does not use customer data for model training, but all data is processed by third-party AI providers.

**Anonymization**
: Data sent to the AI Assistant is *not* anonymized, including alert data, configurations, queries, logs, and chat interactions.
: Data sent to the AI Assistant is *not* anonymized, including alert data, configurations, queries, logs, and chat interactions. If you need to anonymize data, use the [anonymization pipeline](#obs-ai-anonymization).

**Permission context**
: When the AI Assistant performs searches, it uses the same permissions as the current user.
Expand Down Expand Up @@ -418,6 +418,84 @@ Enable this feature from the **Settings** tab in AI Assistant Settings by using
For air-gapped environments, installing product documentation requires special configuration. See the [{{kib}} AI Assistants settings documentation](kibana://reference/configuration-reference/ai-assistant-settings.md) for detailed instructions.
::::

## Anonymization (technical preview) [obs-ai-anonymization]

Anonymization masks personally identifiable or otherwise sensitive information before chat messages leave Kibana for a third-party LLM.
Enabled rules substitute deterministic tokens (for example `EMAIL_ee4587…`) so the model can keep context without ever seeing the real value.
When all rules are disabled (the default), data is forwarded unchanged.

### How it works [obs-ai-anonymization-how]

When an anonymization rule is enabled in the [AI Assistant settings](#obs-ai-settings), every message in the request (system prompt, message content, function call arguments/responses) is run through an *anonymization pipeline* before it leaves Kibana:

1. Each enabled **rule** scans the text and replaces any match with a deterministic token such as
`EMAIL_ee4587b4ba681e38996a1b716facbf375786bff7`.
The prefix (`EMAIL`, `PER`, `LOC`, …) is the *entity class*; the suffix is a deterministic hash of the original value.
2. The fully masked conversation is sent to the LLM.
3. After the LLM responds, the original values are restored so the user sees deanonymized text and any persisted conversation history stores the original content. Deanonymization information is stored with the conversation messages to enable the UI to highlight anonymized content.

The following example shows the anonymized content highlighted in the chat window using a regex rule to mask GKE hostnames:

```jsonc
{
"entityClass": "GKE_HOST",
"type": "RegExp",
"pattern": "(gke-[a-zA-Z0-9-]+-[a-f0-9]{8}-[a-zA-Z0-9]+)",
"enabled": true
}
```

:::{image} /solutions/images/observability-obs-ai-assistant-anonymization.png
:alt: AI Assistant chat showing hostname anonymization in action
:screenshot:
:::

### Rule types [obs-ai-anonymization-rules]


**RegExp** — Runs a JavaScript‑style regular expression. Use for fixed patterns such as e‑mail addresses, host names, etc.

```jsonc
{
"type": "RegExp",
"pattern": "([a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,})",
"entityClass": "EMAIL",
"enabled": true
}
```

**NER** — Runs a named‑entity‑recognition model on free text.

```jsonc
{
"type": "NER",
"modelId": "elastic__distilbert-base-uncased-finetuned-conll03-english",
"allowedEntityClasses": ["PER", "ORG", "LOC"],
"enabled": true
}
```

Rules are evaluated top-to-bottom with regex rules processed first, then NER rules; the first rule that captures a given entity wins. Rules can be configured in the [AI Assistant Settings](#obs-ai-settings) page.

### Requirements [obs-ai-anonymization-requirements]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to add a step by step guide?
Similar to how we present steps when adding data to the KB - https://www.elastic.co/docs/solutions/observability/observability-ai-assistant#obs-ai-kb-ui?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might be a bit redundant. Perhaps we can do that as part of a blog which @pmoust mentioned. Let's see what others think...


* **Advanced Settings privilege** to edit the configuration and enable rules.
Once saved, *all* users in the same **Space** benefit from the anonymization (the setting is [space-aware](../../deploy-manage/manage-spaces.md)).
* **ML privilege and resources** if you enable a rule of type NER, you must first [deploy and start a named-entity-recognition model](/explore-analyze/machine-learning/nlp/ml-nlp-ner-example.md#ex-ner-deploy) and have sufficient ML capacity.

::::{important}
The anonymization pipeline has only been validated with Elastic’s English model
[elastic/distilbert-base-uncased-finetuned-conll03-english](https://huggingface.co/elastic/distilbert-base-uncased-finetuned-conll03-english).
Results for other languages or models may vary.
::::

### Limitations [obs-ai-anonymization-limitations]
* **Performance (NER)** – Running a named entity recognition model can add latency depending on the request. To improve performance of the model, consider scaling up your ML nodes by adjusting deployment parameters: increase `number_of_allocations` for better throughput and `threads_per_allocation` for faster individual requests. For details, refer to the [start trained model deployment API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-ml-start-trained-model-deployment).
* **Structured JSON** – The NER model we validated (`elastic/distilbert-base-uncased-finetuned-conll03-english`) is trained on natural English text and often misses entities inside JSON or other structured data. If thorough masking is required, prefer regex rules and craft them to account for JSON syntax.
* **False negatives / positives** – No model or pattern is perfect. Model accuracy may vary depending on model and input.
* **JSON malformation risk** – Both NER inference and regex rules can potentially create malformed JSON when anonymizing JSON data such as function responses. This can occur by replacing text across character boundaries, which may break JSON structure causing the whole request to fail. If this occurs, you may need to adjust your regex pattern or disable the NER rule.


## Known issues [obs-ai-known-issues]

### Token limits [obs-ai-token-limits]
Expand Down
Loading