Adds note on reindexing existing data for semantic_text usage (#113590)

kosabogi · leemthompo · web-flow · commit 4af241b5d62c · 2024-10-08T09:58:18.000+02:00
* Adds note on reindexing existing data for semantic_text usage

* Adds note about full crawl and full sync

* Style guide related fix

* Update docs/reference/search/search-your-data/semantic-search-semantic-text.asciidoc

Co-authored-by: Liam Thompson &lt;32779855+leemthompo@users.noreply.github.com&gt;

---------

Co-authored-by: Liam Thompson &lt;32779855+leemthompo@users.noreply.github.com&gt;
diff --git a/docs/reference/search/search-your-data/semantic-search-semantic-text.asciidoc b/docs/reference/search/search-your-data/semantic-search-semantic-text.asciidoc
@@ -89,6 +89,16 @@ PUT semantic-embeddings
 It will be used to generate the embeddings based on the input text.
 Every time you ingest data into the related `semantic_text` field, this endpoint will be used for creating the vector representation of the text.
 
+[NOTE]
+====
+If you're using web crawlers or connectors to generate indices, you have to
+<<indices-put-mapping,update the index mappings>> for these indices to
+include the `semantic_text` field. Once the mapping is updated, you'll need to run
+a full web crawl or a full connector sync. This ensures that all existing
+documents are reprocessed and updated with the new semantic embeddings,
+enabling semantic search on the updated data.
+====
+
 
 [discrete]
 [[semantic-text-load-data]]
@@ -118,6 +128,13 @@ Create the embeddings from the text by reindexing the data from the `test-data`
 The data in the `content` field will be reindexed into the `content` semantic text field of the destination index.
 The reindexed data will be processed by the {infer} endpoint associated with the `content` semantic text field.
 
+[NOTE]
+====
+This step uses the reindex API to simulate data ingestion. If you are working with data that has already been indexed,
+rather than using the test-data set, reindexing is required to ensure that the data is processed by the {infer} endpoint
+and the necessary embeddings are generated.
+====
+
 [source,console]
 ------------------------------------------------------------
 POST _reindex?wait_for_completion=false