[DOCS] Add full-text search overview

leemthompo · leemthompo · commit 0ea93d2506bf · 2025-01-02T15:25:37.000+01:00
diff --git a/docs/reference/analysis/tokenizers.asciidoc b/docs/reference/analysis/tokenizers.asciidoc
@@ -1,6 +1,13 @@
 [[analysis-tokenizers]]
 == Tokenizer reference
 
+[NOTE]
+====
+{es}'s text analysis produces meaningful _linguistic_ tokens (like words and phrases) optimized for search relevance scoring.
+This differs from neural tokenizers, which break text into smaller subword units and numerical vectors for machine learning models.
+For example, "searching" becomes the searchable word token "search" in {es}, while a neural tokenizer might split it into ["sea", "##rch", "##ing"] for model consumption.
+====
+
 A _tokenizer_ receives a stream of characters, breaks it up into individual
 _tokens_ (usually individual words), and outputs a stream of _tokens_. For
 instance, a <<analysis-whitespace-tokenizer,`whitespace`>> tokenizer breaks
diff --git a/docs/reference/images/search/full-text-search-overview.svg b/docs/reference/images/search/full-text-search-overview.svg
@@ -0,0 +1,62 @@
+<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 600 900">
+    <!-- Styles -->
+    <defs>
+        <marker id="arrowhead" markerWidth="10" markerHeight="7" refX="9" refY="3.5" orient="auto">
+            <polygon points="0 0, 10 3.5, 0 7" fill="#666"/>
+        </marker>
+        <style type="text/css">
+            text {
+                font-family: Inter, Arial, sans-serif;
+            }
+        </style>
+    </defs>
+    
+    <!-- Background -->
+    <rect width="600" height="900" fill="#f8f9fa"/>
+    
+    <!-- Title -->
+    <text x="300" y="50" text-anchor="middle" font-size="24" font-weight="bold" fill="#333">
+        Full-text search with Elasticsearch
+    </text>
+
+    <!-- Document Input Box -->
+    <rect x="50" y="100" width="200" height="80" rx="10" fill="#e3f2fd" stroke="#1976d2" stroke-width="2"/>
+    <text x="150" y="145" text-anchor="middle" font-size="16" fill="#1976d2">Source documents</text>
+
+    <!-- Analysis Pipeline -->
+    <rect x="200" y="250" width="200" height="80" rx="10" fill="#e8f5e9" stroke="#2e7d32" stroke-width="2"/>
+    <text x="300" y="280" text-anchor="middle" font-size="16" fill="#2e7d32">Analysis pipeline</text>
+    <text x="300" y="300" text-anchor="middle" font-size="12" fill="#666">Transforms text to normalized terms</text>
+
+    <!-- Inverted Index -->
+    <rect x="200" y="420" width="200" height="80" rx="10" fill="#fff3e0" stroke="#ef6c00" stroke-width="2"/>
+    <text x="300" y="455" text-anchor="middle" font-size="16" fill="#ef6c00">Inverted index</text>
+    <text x="300" y="475" text-anchor="middle" font-size="12" fill="#666">Search-optimized data structure</text>
+
+    <!-- Search Query Box -->
+    <rect x="350" y="100" width="200" height="80" rx="10" fill="#f3e5f5" stroke="#7b1fa2" stroke-width="2"/>
+    <text x="450" y="145" text-anchor="middle" font-size="16" fill="#7b1fa2">Search query</text>
+
+    <!-- Scoring Engine -->
+    <rect x="200" y="590" width="200" height="80" rx="10" fill="#ffebee" stroke="#c62828" stroke-width="2"/>
+    <text x="300" y="620" text-anchor="middle" font-size="16" fill="#c62828">Relevance scoring</text>
+    <text x="300" y="640" text-anchor="middle" font-size="12" fill="#666">Similarity algorithm scores documents</text>
+
+    <!-- Results Box -->
+    <rect x="200" y="760" width="200" height="80" rx="10" fill="#e0f7fa" stroke="#006064" stroke-width="2"/>
+    <text x="300" y="805" text-anchor="middle" font-size="16" fill="#006064">Search results</text>
+    <text x="300" y="825" text-anchor="middle" font-size="12" fill="#666">Most relevant results returned first</text>
+
+    <!-- Arrows -->
+    <line x1="150" y1="180" x2="150" y2="300" stroke="#666" stroke-width="2"/>
+    <line x1="150" y1="300" x2="200" y2="300" stroke="#666" stroke-width="2" marker-end="url(#arrowhead)"/>
+    
+    <line x1="450" y1="180" x2="450" y2="300" stroke="#666" stroke-width="2"/>
+    <line x1="450" y1="300" x2="400" y2="300" stroke="#666" stroke-width="2" marker-end="url(#arrowhead)"/>
+    
+    <line x1="300" y1="330" x2="300" y2="420" stroke="#666" stroke-width="2" marker-end="url(#arrowhead)"/>
+    
+    <line x1="300" y1="500" x2="300" y2="590" stroke="#666" stroke-width="2" marker-end="url(#arrowhead)"/>
+    
+    <line x1="300" y1="670" x2="300" y2="760" stroke="#666" stroke-width="2" marker-end="url(#arrowhead)"/>
+</svg>
diff --git a/docs/reference/search/search-your-data/full-text-search.asciidoc b/docs/reference/search/search-your-data/full-text-search.asciidoc
@@ -0,0 +1,68 @@
+[[full-text-search]]
+== Full-text search
+
+.Hands-on introduction to full-text search
+[TIP]
+====
+Would you prefer to jump straight into a hands-on tutorial?
+Refer to our quick start <<full-text-filter-tutorial,full-text search tutorial>>.
+====
+
+Full-text search, also known as lexical search, is a technique for fast, efficient searching through text fields in documents.
+Documents and search queries are transformed to enable returning https://www.elastic.co/what-is/search-relevance[relevant] results instead of simply exact term matches.
+Fields of type <<text-field-type,`text`>> are analyzed and indexed for full-text search.
+
+Built on decades of information retrieval research, full-text search in {es} is a compute-efficient, deterministic approach that scales predictably with data volume.
+Full-text search is the cornerstone of production-grade search solutions.
+Combine full-text search with <<semantic-search,semantic search using vectors>> to build modern hybrid search applications.
+
+[discrete]
+[[full-text-search-how-it-works]]
+=== How full-text search works
+
+The following diagram illustrates the components of full-text search. Note that the query text also undergoes text analysis, so that it's transformed in the same way as the indexed text. 
+
+image::images/search/full-text-search-overview.svg[Components of full-text search from analysis to relevance scoring, align=center, width=500]
+
+At a high level, full-text search involves the following:
+
+* <<analysis-overview,*Text analysis*>>: Analysis consists of a pipeline of sequential transformations. Text is transformed into a format optimized for searching by stemming, lowercasing, stop word elimination, etc. {es} contains a number of built-in <<analysis-analyzers,analyzers>> (including language-specific analyzers) and tokenizers, and you can also create custom analyzers.
++
+[TIP]
+====
+Refer to <<test-analyzer,Test an analyzer>> to learn how to test an analyzer and inspect the tokens and metadata it generates.
+====
+* *Inverted index*: After analysis is complete, {es} builds an inverted index from the resulting tokens.
+An inverted index is a data structure that maps each token to the documents that contain it.
+It's made up of two key components:
+** *Dictionary*: A sorted list of all unique terms in the collection of documents in your index.
+** *Posting list*: For each term, a list of document IDs where the term appears, along with optional metadata like term frequency and position.
+* *Relevance scoring*: Results are ranked by how relevant they are to the given query. The relevance score of each document is represented by a positive floating-point number called the `_score`. The higher the `_score`, the more relevant the document.
++
+The default <<index-modules-similarity,similarity algorithm>> {es} uses for calculating relevance scores is https://en.wikipedia.org/wiki/Okapi_BM25[Okapi BM25], a variation of the https://en.wikipedia.org/wiki/Tf–idf[TF-IDF algorithm]. BM25 calculates relevance scores based on term frequency, document frequency, and document length.
+Refer to this https://www.elastic.co/blog/practical-bm25-part-2-the-bm25-algorithm-and-its-variables[technical blog post] for a deep dive into BM25.
+* *Full-text search query*: Query text is analyzed <<analysis-index-search-time,the same way as the indexed text>>, and the resulting tokens are used to search the inverted index.
++ 
+Query DSL supports a number of <<full-text-queries,full-text queries>>.
++ 
+As of 8.17, {esql} also supports <<esql-search-functions,full-text search>> functions.
+
+[discrete]
+[[full-text-search-learn-more]]
+=== Learn more
+
+.Getting Started
+* <<full-text-filter-tutorial,Hands-on full-text search tutorial>> 
+
+.Core Concepts
+* <<text,Text fields>>
+* <<analysis,Text analysis>>
+* <<analysis-tokenizers,Tokenizers>>
+* <<analysis-analyzers,Analyzers>>
+
+.Search APIs
+* <<full-text-queries,Full-text queries using Query DSL>> 
+* <<esql-search-functions,Full-text search functions in {esql}>>
+
+.Advanced Topics
+* https://www.elastic.co/blog/practical-bm25-part-2-the-bm25-algorithm-and-its-variables[Practical BM25: Part 2 - The BM25 Algorithm and its Variables]
diff --git a/docs/reference/search/search-your-data/search-your-data.asciidoc b/docs/reference/search/search-your-data/search-your-data.asciidoc
@@ -42,7 +42,9 @@ DSL, with a simplified user experience. Create search applications based on your
 {es} indices, build queries using search templates, and easily preview your
 results directly in the Kibana Search UI.
 
+include
 include::search-api.asciidoc[]
+include::full-text-search.asciidoc[]
 include::../../how-to/recipes.asciidoc[]
 // ☝️ search relevance recipes
 include::retrievers-overview.asciidoc[]