You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
summary: "[ExtraHop & QualysGAV] Add `manage`, `create_index`, `read`, `index`, `write`, `delete`, permission for third party agent indices `kibana_system`"
Copy file name to clipboardExpand all lines: docs/reference/elasticsearch/mapping-reference/semantic-text.md
+94-39Lines changed: 94 additions & 39 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -282,6 +282,34 @@ PUT test-index/_doc/1
282
282
* Others (such as `elastic` and `elasticsearch`) will automatically truncate
283
283
the input.
284
284
285
+
## Retrieving indexed chunks
286
+
```{applies_to}
287
+
stack: ga 9.2
288
+
serverless: ga
289
+
```
290
+
291
+
You can retrieve the individual chunks generated by your semantic field’s chunking
292
+
strategy using the [fields parameter](/reference/elasticsearch/rest-apis/retrieve-selected-fields.md#search-fields-param):
293
+
294
+
```console
295
+
POST test-index/_search
296
+
{
297
+
"query": {
298
+
"ids" : {
299
+
"values" : ["1"]
300
+
}
301
+
},
302
+
"fields": [
303
+
{
304
+
"field": "semantic_text_field",
305
+
"format": "chunks" <1>
306
+
}
307
+
]
308
+
}
309
+
```
310
+
311
+
1. Use `"format": "chunks"` to return the field’s text as the original text chunks that were indexed.
312
+
285
313
## Extracting relevant fragments from semantic text [semantic-text-highlighting]
286
314
287
315
You can extract the most relevant fragments from a semantic text field by using
@@ -311,27 +339,6 @@ POST test-index/_search
311
339
2. Sorts the most relevant highlighted fragments by score when set to `score`. By default,
312
340
fragments will be output in the order they appear in the field (order: none).
313
341
314
-
To use the `semantic` highlighter to view chunks in the order which they were indexed with no scoring,
315
-
use the `match_all` query to retrieve them in the order they appear in the document:
316
-
317
-
```console
318
-
POST test-index/_search
319
-
{
320
-
"query": {
321
-
"match_all": {}
322
-
},
323
-
"highlight": {
324
-
"fields": {
325
-
"my_semantic_field": {
326
-
"number_of_fragments": 5 <1>
327
-
}
328
-
}
329
-
}
330
-
}
331
-
```
332
-
333
-
1. This will return the first 5 chunks, set this number higher to retrieve more chunks.
334
-
335
342
Highlighting is supported on fields other than semantic_text. However, if you
336
343
want to restrict highlighting to the semantic highlighter and return no
337
344
fragments when the field is not of type semantic_text, you can explicitly
@@ -359,6 +366,49 @@ PUT test-index
359
366
360
367
1. Ensures that highlighting is applied exclusively to semantic_text fields.
361
368
369
+
To retrieve all fragments from the `semantic` highlighter in their original indexing order
370
+
without scoring, use a `match_all` query as the `highlight_query`.
371
+
This ensures fragments are returned in the order they appear in the document:
372
+
373
+
```console
374
+
POST test-index/_search
375
+
{
376
+
"query": {
377
+
"ids": {
378
+
"values": ["1"]
379
+
}
380
+
},
381
+
"highlight": {
382
+
"fields": {
383
+
"my_semantic_field": {
384
+
"number_of_fragments": 5, <1>
385
+
"highlight_query": { "match_all": {} }
386
+
}
387
+
}
388
+
}
389
+
}
390
+
```
391
+
392
+
1. Returns the first 5 fragments. Increase this value to retrieve additional fragments.
393
+
394
+
## Updates and partial updates for `semantic_text` fields [semantic-text-updates]
395
+
396
+
When updating documents that contain `semantic_text` fields, it’s important to understand how inference is triggered:
397
+
398
+
***Full document updates**
399
+
When you perform a full document update, **all `semantic_text` fields will re-run inference** even if their values did not change. This ensures that the embeddings are always consistent with the current document state but can increase ingestion costs.
400
+
401
+
***Partial updates using the Bulk API**
402
+
Partial updates that **omit `semantic_text` fields** and are submitted through the [Bulk API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-bulk) will **reuse the existing embeddings** stored in the index. In this case, inference is **not triggered** for fields that were not updated, which can significantly reduce processing time and cost.
403
+
404
+
***Partial updates using the Update API**
405
+
When using the [Update API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-update) with a `doc` object that **omits `semantic_text` fields**, inference **will still run** on all `semantic_text` fields. This means that even if the field values are not changed, embeddings will be re-generated.
406
+
407
+
If you want to avoid unnecessary inference and keep existing embeddings:
408
+
409
+
* Use **partial updates through the Bulk API**.
410
+
* Omit any `semantic_text` fields that did not change from the `doc` object in your request.
0 commit comments