[DOCS] Add full-text search overview (elastic#119462) (elastic#119607)

leemthompo · web-flow · commit 446c7565eb01 · 2025-01-07T05:27:44.000+11:00
diff --git a/docs/reference/analysis.asciidoc b/docs/reference/analysis.asciidoc
@@ -9,8 +9,7 @@
 --
 
 _Text analysis_ is the process of converting unstructured text, like
-the body of an email or a product description, into a structured format that's
-optimized for search.
+the body of an email or a product description, into a structured format that's <<full-text-search,optimized for search>>.
 
 [discrete]
 [[when-to-configure-analysis]]
diff --git a/docs/reference/analysis/tokenizers.asciidoc b/docs/reference/analysis/tokenizers.asciidoc
@@ -1,6 +1,14 @@
 [[analysis-tokenizers]]
 == Tokenizer reference
 
+.Difference between {es} tokenization and neural tokenization
+[NOTE]
+====
+{es}'s tokenization process produces linguistic tokens, optimized for search and retrieval.
+This differs from neural tokenization in the context of machine learning and natural language processing. Neural tokenizers translate strings into smaller, subword tokens, which are encoded into vectors for consumptions by neural networks.
+{es} does not have built-in neural tokenizers.
+====
+
 A _tokenizer_ receives a stream of characters, breaks it up into individual
 _tokens_ (usually individual words), and outputs a stream of _tokens_. For
 instance, a <<analysis-whitespace-tokenizer,`whitespace`>> tokenizer breaks
diff --git a/docs/reference/images/search/full-text-search-overview.svg b/docs/reference/images/search/full-text-search-overview.svg
@@ -0,0 +1,81 @@
+<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 700 900">
+    <!-- Styles -->
+    <defs>
+        <marker id="arrowhead" markerWidth="10" markerHeight="7" refX="9" refY="3.5" orient="auto">
+            <polygon points="0 0, 10 3.5, 0 7" fill="#666"/>
+        </marker>
+        <style type="text/css">
+            text {
+                font-family: Inter, Arial, sans-serif;
+            }
+            .title {
+                font-size: 24px;
+                font-weight: bold;
+                fill: #333;
+            }
+            .box-title {
+                font-size: 16px;
+                fill: #333;
+            }
+            .box-subtitle {
+                font-size: 12px;
+                fill: #666;
+            }
+            .search-text {
+                fill: #2d5a27;
+            }
+            .doc-text {
+                fill: #1a3d66;
+            }
+        </style>
+    </defs>
+    
+    <!-- Background -->
+    <rect width="700" height="900" fill="#ffffff"/>
+    
+    <!-- Title -->
+    <text x="350" y="50" text-anchor="middle" class="title">
+        Full-text search with Elasticsearch
+    </text>
+
+    <!-- Document Input Box -->
+    <rect x="50" y="100" width="240" height="80" rx="10" fill="#e8f0f9" stroke="#1a3d66" stroke-width="2"/>
+    <text x="170" y="145" text-anchor="middle" class="box-title doc-text">Source documents</text>
+
+    <!-- Analysis Pipeline -->
+    <rect x="230" y="250" width="240" height="80" rx="10" fill="#f4f4f4" stroke="#666" stroke-width="2"/>
+    <text x="350" y="285" text-anchor="middle" class="box-title">Analysis pipeline</text>
+    <text x="350" y="305" text-anchor="middle" class="box-subtitle">Transforms text to normalized terms</text>
+
+    <!-- Inverted Index -->
+    <rect x="230" y="420" width="240" height="80" rx="10" fill="#f4f4f4" stroke="#666" stroke-width="2"/>
+    <text x="350" y="455" text-anchor="middle" class="box-title">Inverted index</text>
+    <text x="350" y="475" text-anchor="middle" class="box-subtitle">Search-optimized data structure</text>
+
+    <!-- Search Query Box -->
+    <rect x="410" y="100" width="240" height="80" rx="10" fill="#edf7ec" stroke="#2d5a27" stroke-width="2"/>
+    <text x="530" y="145" text-anchor="middle" class="box-title search-text">Search query</text>
+
+    <!-- Scoring Engine -->
+    <rect x="230" y="590" width="240" height="80" rx="10" fill="#f4f4f4" stroke="#666" stroke-width="2"/>
+    <text x="350" y="625" text-anchor="middle" class="box-title">Relevance scoring</text>
+    <text x="350" y="645" text-anchor="middle" class="box-subtitle">Similarity algorithm scores documents</text>
+
+    <!-- Results Box (using circle shape) -->
+    <circle cx="350" cy="800" r="70" fill="#edf7ec" stroke="#2d5a27" stroke-width="2"/>
+    <text x="350" y="790" text-anchor="middle" class="box-title search-text">Search results</text>
+    <text x="350" y="810" text-anchor="middle" class="box-subtitle">Most relevant first</text>
+
+    <!-- Arrows -->
+    <line x1="170" y1="180" x2="170" y2="300" stroke="#666" stroke-width="2"/>
+    <line x1="170" y1="300" x2="230" y2="300" stroke="#666" stroke-width="2" marker-end="url(#arrowhead)"/>
+    
+    <line x1="530" y1="180" x2="530" y2="300" stroke="#666" stroke-width="2"/>
+    <line x1="530" y1="300" x2="470" y2="300" stroke="#666" stroke-width="2" marker-end="url(#arrowhead)"/>
+    
+    <line x1="350" y1="330" x2="350" y2="420" stroke="#666" stroke-width="2" marker-end="url(#arrowhead)"/>
+    
+    <line x1="350" y1="500" x2="350" y2="590" stroke="#666" stroke-width="2" marker-end="url(#arrowhead)"/>
+    
+    <line x1="350" y1="670" x2="350" y2="730" stroke="#666" stroke-width="2" marker-end="url(#arrowhead)"/>
+</svg>
diff --git a/docs/reference/intro.asciidoc b/docs/reference/intro.asciidoc
@@ -260,7 +260,7 @@ Refer to <<getting-started,first steps with Elasticsearch>> for a hands-on examp
 
 *{esql}* is a new piped query language and compute engine which was first added in version *8.11*.
 
-{esql} does not yet support all the features of Query DSL, like full-text search and semantic search.
+{esql} does not yet support all the features of Query DSL.
 Look forward to new {esql} features and functionalities in each release.
 
 Refer to <<search-analyze-query-languages>> for a full overview of the query languages available in {es}.
@@ -280,7 +280,7 @@ The <<search-your-data, `_search` endpoint>> accepts queries written in Query DS
 
 Query DSL support a wide range of search techniques, including the following:
 
-* <<full-text-queries,*Full-text search*>>: Search text that has been analyzed and indexed to support phrase or proximity queries, fuzzy matches, and more.
+* <<full-text-search,*Full-text search*>>: Search text that has been analyzed and indexed to support phrase or proximity queries, fuzzy matches, and more.
 * <<keyword,*Keyword search*>>: Search for exact matches using `keyword` fields.
 * <<semantic-search-semantic-text,*Semantic search*>>: Search `semantic_text` fields using dense or sparse vector search on embeddings generated in your {es} cluster.
 * <<knn-search,*Vector search*>>: Search for similar dense vectors using the kNN algorithm for embeddings generated outside of {es}.
@@ -328,8 +328,7 @@ directly executed within {es} itself.
 
 The <<esql-rest,`_query` endpoint>> accepts queries written in {esql} syntax.
 
-Today, it supports a subset of the features available in Query DSL, like aggregations, filters, and transformations.
-It does not yet support full-text search or semantic search.
+Today, it supports a subset of the features available in Query DSL, but it is rapidly evolving.
 
 It comes with a comprehensive set of <<esql-functions-operators,functions and operators>> for working with data and has robust integration with {kib}'s Discover, dashboards and visualizations.
 
diff --git a/docs/reference/quickstart/full-text-filtering-tutorial.asciidoc b/docs/reference/quickstart/full-text-filtering-tutorial.asciidoc
@@ -4,7 +4,7 @@
 <titleabbrev>Basics: Full-text search and filtering</titleabbrev>
 ++++
 
-This is a hands-on introduction to the basics of full-text search with {es}, also known as _lexical search_, using the <<search-search,`_search` API>> and <<query-dsl,Query DSL>>.
+This is a hands-on introduction to the basics of <<full-text-search,full-text search>> with {es}, also known as _lexical search_, using the <<search-search,`_search` API>> and <<query-dsl,Query DSL>>.
 You'll also learn how to filter data, to narrow down search results based on exact criteria.
 
 In this scenario, we're implementing a search function for a cooking blog.
@@ -632,6 +632,7 @@ This tutorial introduced the basics of full-text search and filtering in {es}.
 Building a real-world search experience requires understanding many more advanced concepts and techniques.
 Here are some resources once you're ready to dive deeper:
 
+* <<full-text-search, Full-text search>>: Learn about the core components of full-text search in {es}.
 * <<search-analyze, Elasticsearch basics — Search and analyze data>>: Understand all your options for searching and analyzing data in {es}.
 * <<analysis,Text analysis>>: Understand how text is processed for full-text search.
 * <<search-with-elasticsearch>>: Learn about more advanced search techniques using the `_search` API, including semantic search.
diff --git a/docs/reference/search/search-your-data/full-text-search.asciidoc b/docs/reference/search/search-your-data/full-text-search.asciidoc
@@ -0,0 +1,82 @@
+[[full-text-search]]
+== Full-text search
+
+.Hands-on introduction to full-text search
+[TIP]
+====
+Would you prefer to jump straight into a hands-on tutorial?
+Refer to our quick start <<full-text-filter-tutorial,full-text search tutorial>>.
+====
+
+Full-text search, also known as lexical search, is a technique for fast, efficient searching through text fields in documents.
+Documents and search queries are transformed to enable returning https://www.elastic.co/what-is/search-relevance[relevant] results instead of simply exact term matches.
+Fields of type <<text-field-type,`text`>> are analyzed and indexed for full-text search.
+
+Built on decades of information retrieval research, full-text search delivers reliable results that scale predictably as your data grows. Because it runs efficiently on CPUs, {es}'s full-text search requires minimal computational resources compared to GPU-intensive vector operations.
+
+You can combine full-text search with <<semantic-search,semantic search using vectors>> to build modern hybrid search applications. While vector search may require additional GPU resources, the full-text component remains cost-effective by leveraging existing CPU infrastructure.
+
+[discrete]
+[[full-text-search-how-it-works]]
+=== How full-text search works
+
+The following diagram illustrates the components of full-text search.
+
+image::images/search/full-text-search-overview.svg[Components of full-text search from analysis to relevance scoring, align=center, width=500]
+
+At a high level, full-text search involves the following:
+
+* <<analysis-overview,*Text analysis*>>: Analysis consists of a pipeline of sequential transformations. Text is transformed into a format optimized for searching using techniques such as stemming, lowercasing, and stop word elimination. {es} contains a number of built-in <<analysis-analyzers,analyzers>> and tokenizers, including options to analyze specific language text. You can also create custom analyzers.
++
+[TIP]
+====
+Refer to <<test-analyzer,Test an analyzer>> to learn how to test an analyzer and inspect the tokens and metadata it generates.
+====
+* *Inverted index creation*: After analysis is complete, {es} builds an inverted index from the resulting tokens.
+An inverted index is a data structure that maps each token to the documents that contain it.
+It's made up of two key components:
+** *Dictionary*: A sorted list of all unique terms in the collection of documents in your index.
+** *Posting list*: For each term, a list of document IDs where the term appears, along with optional metadata like term frequency and position.
+* *Relevance scoring*: Results are ranked by how relevant they are to the given query. The relevance score of each document is represented by a positive floating-point number called the `_score`. The higher the `_score`, the more relevant the document.
++
+The default <<index-modules-similarity,similarity algorithm>> {es} uses for calculating relevance scores is https://en.wikipedia.org/wiki/Okapi_BM25[Okapi BM25], a variation of the https://en.wikipedia.org/wiki/Tf–idf[TF-IDF algorithm]. BM25 calculates relevance scores based on term frequency, document frequency, and document length.
+Refer to this https://www.elastic.co/blog/practical-bm25-part-2-the-bm25-algorithm-and-its-variables[technical blog post] for a deep dive into BM25.
+* *Full-text search query*: Query text is analyzed <<analysis-index-search-time,the same way as the indexed text>>, and the resulting tokens are used to search the inverted index.
++ 
+Query DSL supports a number of <<full-text-queries,full-text queries>>.
++ 
+As of 8.17, {esql} also supports <<esql-search-functions,full-text search>> functions.
+
+[discrete]
+[[full-text-search-getting-started]]
+=== Getting started
+
+For a hands-on introduction to full-text search, refer to the <<full-text-filter-tutorial,full-text search tutorial>>.
+
+[discrete]
+[[full-text-search-learn-more]]
+=== Learn more
+
+Here are some resources to help you learn more about full-text search with {es}.
+
+*Core concepts*
+
+Learn about the core components of full-text search:
+
+* <<text,Text fields>>
+* <<analysis,Text analysis>>
+** <<analysis-tokenizers,Tokenizers>>
+** <<analysis-analyzers,Analyzers>>
+
+*{es} query languages*
+
+Learn how to build full-text search queries using {es}'s query languages:
+
+* <<full-text-queries,Full-text queries using Query DSL>> 
+* <<esql-search-functions,Full-text search functions in {esql}>>
+
+*Advanced topics*
+
+For a technical deep dive into {es}'s BM25 implementation read this blog post: https://www.elastic.co/blog/practical-bm25-part-2-the-bm25-algorithm-and-its-variables[The BM25 Algorithm and its Variables].
+
+To learn how to optimize the relevance of your search results, refer to <<recipes,Search relevance optimizations>>.
diff --git a/docs/reference/search/search-your-data/search-your-data.asciidoc b/docs/reference/search/search-your-data/search-your-data.asciidoc
@@ -18,7 +18,7 @@ Search for exact values::
 Search for <<term-level-queries,exact values or ranges>> of numbers, dates, IPs,
 or strings.
 
-Full-text search::
+<<full-text-search,Full-text search>>::
 Use <<full-text-queries,full text queries>> to query <<analysis,unstructured
 textual data>> and find documents that best match query terms.
 
@@ -43,6 +43,7 @@ DSL, with a simplified user experience. Create search applications based on your
 results directly in the Kibana Search UI.
 
 include::search-api.asciidoc[]
+include::full-text-search.asciidoc[]
 include::../../how-to/recipes.asciidoc[]
 // ☝️ search relevance recipes
 include::retrievers-overview.asciidoc[]