elastic · szabosteve · Oct 16, 2024 · Oct 16, 2024
diff --git a/docs/reference/inference/service-elser.asciidoc b/docs/reference/inference/service-elser.asciidoc
@@ -80,12 +80,13 @@ Must be a power of 2. Max allowed value is 32.
 [[inference-example-elser]]
 ==== ELSER service example
 
-The following example shows how to create an {infer} endpoint called
-`my-elser-model` to perform a `sparse_embedding` task type.
+The following example shows how to create an {infer} endpoint called `my-elser-model` to perform a `sparse_embedding` task type.
 Refer to the {ml-docs}/ml-nlp-elser.html[ELSER model documentation] for more info.
 
-The request below will automatically download the ELSER model if it isn't
-already downloaded and then deploy the model.
+NOTE: If you want to optimize your ELSER endpoint for ingest, set the number of threads to `1` (`"num_threads": 1`).
+If you want to optimize your ELSER endpoint for search, set the number of threads to greater than `1`.
+
+The request below will automatically download the ELSER model if it isn't already downloaded and then deploy the model.
 
 [source,console]
 ------------------------------------------------------------
@@ -100,7 +101,6 @@ PUT _inference/sparse_embedding/my-elser-model
 ------------------------------------------------------------
 // TEST[skip:TBD]
 
-
 Example response:
 
 [source,console-result]
@@ -130,12 +130,12 @@ If using the Python client, you can set the `timeout` parameter to a higher valu
 [[inference-example-elser-adaptive-allocation]]
 ==== Setting adaptive allocation for the ELSER service
 
-The following example shows how to create an {infer} endpoint called
-`my-elser-model` to perform a `sparse_embedding` task type and configure
-adaptive allocations.
+NOTE: For more information on how to optimize your ELSER endpoints, refer to {ml-docs}/ml-nlp-elser.html#elser-recommendations[the ELSER recommendations] section in the model documentation.
+To learn more about model autoscaling, refer to the {ml-docs}/ml-nlp-auto-scale.html[trained model autoscaling] page.
+
+The following example shows how to create an {infer} endpoint called `my-elser-model` to perform a `sparse_embedding` task type and configure adaptive allocations.
 
-The request below will automatically download the ELSER model if it isn't
-already downloaded and then deploy the model.
+The request below will automatically download the ELSER model if it isn't already downloaded and then deploy the model.
 
 [source,console]
 ------------------------------------------------------------

diff --git a/docs/reference/search/search-your-data/semantic-search-semantic-text.asciidoc b/docs/reference/search/search-your-data/semantic-search-semantic-text.asciidoc
@@ -50,7 +50,7 @@ PUT _inference/sparse_embedding/my-elser-endpoint <1>
 be used and ELSER creates sparse vectors. The `inference_id` is
 `my-elser-endpoint`.
 <2> The `elser` service is used in this example.
-<3> This setting enables and configures adaptive allocations.
+<3> This setting enables and configures {ml-docs}/ml-nlp-auto-scale.html#nlp-model-adaptive-allocations[adaptive allocations].
 Adaptive allocations make it possible for ELSER to automatically scale up or down resources based on the current load on the process.
 
 [NOTE]
@@ -284,6 +284,8 @@ query from the `semantic-embedding` index:
 
 [discrete]
 [[semantic-text-further-examples]]
-==== Further examples
+==== Further examples and reading
 
-If you want to use `semantic_text` in hybrid search, refer to https://colab.research.google.com/github/elastic/elasticsearch-labs/blob/main/notebooks/search/09-semantic-text.ipynb[this notebook] for a step-by-step guide.
+* If you want to use `semantic_text` in hybrid search, refer to https://colab.research.google.com/github/elastic/elasticsearch-labs/blob/main/notebooks/search/09-semantic-text.ipynb[this notebook] for a step-by-step guide.
+* For more information on how to optimize your ELSER endpoints, refer to {ml-docs}/ml-nlp-elser.html#elser-recommendations[the ELSER recommendations] section in the model documentation.
+* To learn more about model autoscaling, refer to the {ml-docs}/ml-nlp-auto-scale.html[trained model autoscaling] page.