Add docs

kderusso · kderusso · commit b0441845879f · 2025-06-13T13:57:56.000-04:00
diff --git a/docs/reference/elasticsearch/mapping-reference/semantic-text.md b/docs/reference/elasticsearch/mapping-reference/semantic-text.md
@@ -28,7 +28,7 @@ service.
 
 Using `semantic_text`, you won’t need to specify how to generate embeddings for
 your data, or how to index it. The {{infer}} endpoint automatically determines
-the embedding generation, indexing, and query to use. 
+the embedding generation, indexing, and query to use.
 Newly created indices with `semantic_text` fields using dense embeddings will be
 [quantized](/reference/elasticsearch/mapping-reference/dense-vector.md#dense-vector-quantization)
 to `bbq_hnsw` automatically.
@@ -111,6 +111,33 @@ the [Create {{infer}} API](https://www.elastic.co/docs/api/doc/elasticsearch/ope
 to create the endpoint. If not specified, the {{infer}} endpoint defined by
 `inference_id` will be used at both index and query time.
 
+`index_options`
+:   (Optional, string) Specifies the index options to override default values
+for the field. Currently, `dense_vector` index options are supported.
+For text embeddings, `index_options` may match any allowed
+[dense_vector index options](/reference/elasticsearch/mapping-reference/dense-vector.md#dense-vector-index-options).
+
+An example of how to set index_options for a `semantic_text` field:
+
+```console
+PUT my-index-000004
+{
+  "mappings": {
+    "properties": {
+      "inference_field": {
+        "type": "semantic_text",
+        "inference_id": "my-text-embedding-endpoint",
+        "index_options": {
+          "dense_vector": {
+            "type": "int4_flat"
+          }
+        }
+      }
+    }
+  }
+}
+```
+
 `chunking_settings`
 :   (Optional, object) Settings for chunking text into smaller passages.
 If specified, these will override the chunking settings set in the {{infer-cap}}
@@ -138,8 +165,10 @@ To completely disable chunking, use the `none` chunking strategy.
     or `1`. Required for `sentence` type chunking settings
 
 ::::{warning}
-If the input exceeds the maximum token limit of the underlying model,  some services (such as OpenAI) may return an 
-error. In contrast, the `elastic` and `elasticsearch` services  will automatically truncate the input to fit within the 
+If the input exceeds the maximum token limit of the underlying model, some
+services (such as OpenAI) may return an
+error. In contrast, the `elastic` and `elasticsearch` services will
+automatically truncate the input to fit within the
 model's limit.
 ::::
 
@@ -173,7 +202,8 @@ For more details on chunking and how to configure chunking settings,
 see [Configuring chunking](https://www.elastic.co/docs/api/doc/elasticsearch/group/endpoint-inference)
 in the Inference API documentation.
 
-You can pre-chunk the input by sending it to Elasticsearch as an array of strings.
+You can pre-chunk the input by sending it to Elasticsearch as an array of
+strings.
 Example:
 
 ```console
@@ -203,15 +233,20 @@ PUT test-index/_doc/1
 ```
 
 1. The text is pre-chunked and provided as an array of strings.
-   Each element in the array represents a single chunk that will be sent directly to the inference service without further chunking.
+   Each element in the array represents a single chunk that will be sent
+   directly to the inference service without further chunking.
 
 **Important considerations**:
 
-* When providing pre-chunked input, ensure that you set the chunking strategy to `none` to avoid additional processing.
-* Each chunk should be sized carefully, staying within the token limit of the inference service and the underlying model.
-* If a chunk exceeds the model's token limit, the behavior depends on the service:
-  * Some services (such as OpenAI) will return an error.
-  * Others (such as `elastic` and `elasticsearch`) will automatically truncate the input.
+* When providing pre-chunked input, ensure that you set the chunking strategy to
+  `none` to avoid additional processing.
+* Each chunk should be sized carefully, staying within the token limit of the
+  inference service and the underlying model.
+* If a chunk exceeds the model's token limit, the behavior depends on the
+  service:
+    * Some services (such as OpenAI) will return an error.
+    * Others (such as `elastic` and `elasticsearch`) will automatically truncate
+      the input.
 
 Refer
 to [this tutorial](docs-content://solutions/search/semantic-search/semantic-search-semantic-text.md)