-
Notifications
You must be signed in to change notification settings - Fork 159
add anonymization #2179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add anonymization #2179
Changes from 6 commits
8e75f92
7e9db36
e6f35d3
d361613
ae526f4
051abc3
e5bb4e2
0cee54e
9963233
23c2771
ba16779
7c31f8d
b35ed48
c9cdec8
d7786c9
f6dccab
a2ef057
450ab6b
782ebc3
67bbde5
8f15826
6c9815b
0e229be
be80e8c
dc760b3
8faf260
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -61,7 +61,7 @@ It's important to understand how your data is handled when using the AI Assistan | |
: Elastic does not use customer data for model training, but all data is processed by third-party AI providers. | ||
|
||
**Anonymization** | ||
: Data sent to the AI Assistant is *not* anonymized, including alert data, configurations, queries, logs, and chat interactions. | ||
: Data sent to the AI Assistant is *not* anonymized, including alert data, configurations, queries, logs, and chat interactions. If you need to anonymize data, use the [anonymization pipeline](#obs-ai-anonymization). | ||
|
||
**Permission context** | ||
: When the AI Assistant performs searches, it uses the same permissions as the current user. | ||
|
@@ -418,6 +418,68 @@ Enable this feature from the **Settings** tab in AI Assistant Settings by using | |
For air-gapped environments, installing product documentation requires special configuration. See the [{{kib}} AI Assistants settings documentation](kibana://reference/configuration-reference/ai-assistant-settings.md) for detailed instructions. | ||
:::: | ||
|
||
## Anonymization (technical preview) [obs-ai-anonymization] | ||
|
||
Anonymization masks personally identifiable or otherwise sensitive information before chat messages leave Kibana for a third-party LLM. | ||
Enabled rules substitute deterministic tokens (for example EMAIL_ee4587…) so the model can keep context without ever seeing the real value. | ||
neptunian marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
When all rules are disabled (the default), data is forwarded unchanged. | ||
|
||
### How it works [obs-ai-anonymization-how] | ||
|
||
When anonymization is enabled, every message in the request (system prompt, message content, tool call arguments/response, and tool-call response fields) is run through an *anonymization pipeline* before it leaves Kibana: | ||
neptunian marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
||
1. Each enabled **rule** scans its target text and replaces any match with a deterministic token such as | ||
`EMAIL_ee4587b4ba681e38996a1b716facbf375786bff7`. | ||
The prefix (`EMAIL`, `PER`, `LOC`, …) is the *entity class*; the suffix is a SHA-1 hash of the original value. | ||
2. The fully masked conversation is sent to the LLM. | ||
3. After the LLM responds, the original values are restored so the user sees de-anonymised text and any persisted conversation history stores the original content. | ||
|
||
Because the masking is deterministic, the model can still maintain logical consistency (“`EMAIL_x`” always refers to the same address) without ever seeing the real value. | ||
|
||
### Rule types [obs-ai-anonymization-rules] | ||
|
||
|
||
**RegExp** — Runs a JavaScript‑style regular expression. Use for fixed patterns such as e‑mail addresses, host names, etc. | ||
neptunian marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
neptunian marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
||
```jsonc | ||
{ | ||
"type": "RegExp", | ||
"pattern": "([a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,})", | ||
"entityClass": "EMAIL", | ||
"enabled": true | ||
} | ||
``` | ||
|
||
**NER** — Runs a named‑entity‑recognition model on free text. | ||
neptunian marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
||
```jsonc | ||
{ | ||
"type": "NER", | ||
"modelId": "elastic__distilbert-base-uncased-finetuned-conll03-english", | ||
"allowedEntityClasses": ["PER", "ORG", "LOC"], | ||
"enabled": true | ||
} | ||
``` | ||
|
||
Rules are evaluated **top-to-bottom**; the first rule that captures a given entity wins. Rules can be configured in the [AI Assistant Settings](#obs-ai-settings) page. | ||
|
||
### Requirements [obs-ai-anonymization-requirements] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do we need to add a step by step guide? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think it might be a bit redundant. Perhaps we can do that as part of a blog which @pmoust mentioned. Let's see what others think... |
||
|
||
neptunian marked this conversation as resolved.
Show resolved
Hide resolved
|
||
* **Advanced Settings privilege** to edit the configuration and enable rules. | ||
Once saved, *all* users in the same **Space** benefit from the anonymization (the setting is [space-aware](../../deploy-manage/manage-spaces.md)). | ||
* **ML privilege and resources** if you enable a rule of type NER, you must first [deploy and start a named-entity-recognition model](/explore-analyze/machine-learning/nlp/ml-nlp-ner-example.md#ex-ner-deploy) and have sufficient ML capacity. | ||
neptunian marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
||
::::{important} | ||
The anonymization pipeline has only been validated with Elastic’s English model | ||
[elastic/distilbert-base-uncased-finetuned-conll03-english](https://huggingface.co/elastic/distilbert-base-uncased-finetuned-conll03-english). | ||
Results for other languages or models may vary. | ||
:::: | ||
|
||
### Limitations [obs-ai-anonymization-limitations] | ||
* **Performance (NER)** – Running a named entity recognition model can add latency depending on the request. | ||
* **Structured JSON** – The NER model we validated (`elastic/distilbert-base-uncased-finetuned-conll03-english`) is trained on natural English text and often misses entities inside JSON or other structured data. If thorough masking is required, prefer regex rules and craft them to account for JSON syntax. | ||
* **False negatives / positives** – No model or pattern is perfect. Model accuracy may vary depending on model and input. | ||
|
||
## Known issues [obs-ai-known-issues] | ||
|
||
### Token limits [obs-ai-token-limits] | ||
|
Uh oh!
There was an error while loading. Please reload this page.