Skip to content

Commit d335847

Browse files
neptunianviduni94mdbirnstiehl
authored
add anonymization (#2179)
Closes #2001 Documentation for anonymization in the ai assistant. AI contributions from o3 and Sonnet 4. --------- Co-authored-by: Viduni Wickramarachchi <[email protected]> Co-authored-by: Mike Birnstiehl <[email protected]>
1 parent 72ada92 commit d335847

File tree

2 files changed

+88
-1
lines changed

2 files changed

+88
-1
lines changed
119 KB
Loading

solutions/observability/observability-ai-assistant.md

Lines changed: 88 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -61,7 +61,7 @@ It's important to understand how your data is handled when using the AI Assistan
6161
: Elastic does not use customer data for model training, but all data is processed by third-party AI providers.
6262

6363
**Anonymization**
64-
: Data sent to the AI Assistant is *not* anonymized, including alert data, configurations, queries, logs, and chat interactions.
64+
: Data sent to the AI Assistant is *not* anonymized, including alert data, configurations, queries, logs, and chat interactions. If you need to anonymize data, use the [anonymization pipeline](#obs-ai-anonymization).
6565

6666
**Permission context**
6767
: When the AI Assistant performs searches, it uses the same permissions as the current user.
@@ -438,6 +438,93 @@ Enable this feature from the **Settings** tab in AI Assistant Settings by using
438438
For air-gapped environments, installing product documentation requires special configuration. See the [{{kib}} AI Assistants settings documentation](kibana://reference/configuration-reference/ai-assistant-settings.md) for detailed instructions.
439439
::::
440440

441+
## Anonymization [obs-ai-anonymization]
442+
```{applies_to}
443+
serverless: preview
444+
stack: preview 9.1
445+
```
446+
447+
Anonymization masks personally identifiable or otherwise sensitive information before chat messages leave Kibana for a third-party LLM.
448+
Enabled rules substitute deterministic tokens (for example `EMAIL_ee4587…`) so the model can keep context without ever seeing the real value.
449+
When all rules are disabled (the default), data is forwarded unchanged.
450+
451+
### How it works [obs-ai-anonymization-how]
452+
453+
When an anonymization rule is enabled in the [AI Assistant settings](#obs-ai-settings), every message in the request (system prompt, message content, function call arguments/responses) is run through an *anonymization pipeline* before it leaves Kibana:
454+
455+
1. Each enabled **rule** scans the text and replaces any match with a deterministic token such as
456+
`EMAIL_ee4587b4ba681e38996a1b716facbf375786bff7`.
457+
The prefix (`EMAIL`, `PER`, `LOC`, …) is the *entity class*; the suffix is a deterministic hash of the original value.
458+
2. The fully masked conversation is sent to the LLM.
459+
3. After the LLM responds, the original values are restored so the user sees deanonymized text and any persisted conversation history stores the original content. Deanonymization information is stored with the conversation messages to enable the UI to highlight anonymized content.
460+
461+
### Rule types [obs-ai-anonymization-rules]
462+
463+
464+
**RegExp**: Runs a JavaScript‑style regular expression. Use for fixed patterns such as email addresses, host names, etc.
465+
466+
```jsonc
467+
{
468+
"type": "RegExp",
469+
"pattern": "([a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,})",
470+
"entityClass": "EMAIL",
471+
"enabled": true
472+
}
473+
```
474+
475+
**NER**: Runs a named entity recognition (NER) model on free text.
476+
477+
```jsonc
478+
{
479+
"type": "NER",
480+
"modelId": "elastic__distilbert-base-uncased-finetuned-conll03-english",
481+
"allowedEntityClasses": ["PER", "ORG", "LOC"],
482+
"enabled": true
483+
}
484+
```
485+
486+
Rules are evaluated top-to-bottom with `RegExp` rules processed first, then `NER` rules; the first rule that captures a given entity wins. Rules can be configured in the [AI Assistant Settings](#obs-ai-settings) page.
487+
488+
### Example
489+
490+
The following example shows the anonymized content highlighted in the chat window using a `RegExp` rule to mask GKE hostnames:
491+
492+
```jsonc
493+
{
494+
"entityClass": "GKE_HOST",
495+
"type": "RegExp",
496+
"pattern": "(gke-[a-zA-Z0-9-]+-[a-f0-9]{8}-[a-zA-Z0-9]+)",
497+
"enabled": true
498+
}
499+
```
500+
501+
:::{image} /solutions/images/observability-obs-ai-assistant-anonymization.png
502+
:alt: AI Assistant chat showing hostname anonymization in action
503+
:screenshot:
504+
:::
505+
506+
### Requirements [obs-ai-anonymization-requirements]
507+
Anonymization requires the following:
508+
509+
* **Advanced Settings privilege**: Necessary to edit the configuration and enable rules.
510+
Once saved, *all* users in the same **Space** benefit from the anonymization (the setting is [space-aware](../../deploy-manage/manage-spaces.md)).
511+
* **ML privilege and resources**: If you enable a rule of type NER, you must first [deploy and start a named-entity-recognition model](/explore-analyze/machine-learning/nlp/ml-nlp-ner-example.md#ex-ner-deploy) and have sufficient ML capacity.
512+
513+
::::{important}
514+
The anonymization pipeline has only been validated with Elastic’s English model
515+
[elastic/distilbert-base-uncased-finetuned-conll03-english](https://huggingface.co/elastic/distilbert-base-uncased-finetuned-conll03-english).
516+
Results for other languages or models may vary.
517+
::::
518+
519+
### Limitations [obs-ai-anonymization-limitations]
520+
Anonymization has the following limitations:
521+
522+
* **Performance (NER)**: Running an NER model can add latency depending on the request. To improve performance of the model, consider scaling up your ML nodes by adjusting deployment parameters: increase `number_of_allocations` for better throughput and `threads_per_allocation` for faster individual requests. For details, refer to [start trained model deployment API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-ml-start-trained-model-deployment).
523+
* **Structured JSON**: The NER model we validated (`elastic/distilbert-base-uncased-finetuned-conll03-english`) is trained on natural English text and often misses entities inside JSON or other structured data. If thorough masking is required, prefer regex rules and craft them to account for JSON syntax.
524+
* **False negatives / positives**: No model or pattern is perfect. Model accuracy may vary depending on model and input.
525+
* **JSON malformation risk**: Both NER inference and regex rules can potentially create malformed JSON when anonymizing JSON data such as function responses. This can occur by replacing text across character boundaries, which may break JSON structure causing the whole request to fail. If this occurs, you may need to adjust your regex pattern or disable the NER rule.
526+
527+
441528
## Known issues [obs-ai-known-issues]
442529

443530
### Token limits [obs-ai-token-limits]

0 commit comments

Comments
 (0)