Skip to content

Commit 0ea93d2

Browse files
committed
[DOCS] Add full-text search overview
1 parent 7f37edf commit 0ea93d2

File tree

4 files changed

+139
-0
lines changed

4 files changed

+139
-0
lines changed

docs/reference/analysis/tokenizers.asciidoc

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,13 @@
11
[[analysis-tokenizers]]
22
== Tokenizer reference
33

4+
[NOTE]
5+
====
6+
{es}'s text analysis produces meaningful _linguistic_ tokens (like words and phrases) optimized for search relevance scoring.
7+
This differs from neural tokenizers, which break text into smaller subword units and numerical vectors for machine learning models.
8+
For example, "searching" becomes the searchable word token "search" in {es}, while a neural tokenizer might split it into ["sea", "##rch", "##ing"] for model consumption.
9+
====
10+
411
A _tokenizer_ receives a stream of characters, breaks it up into individual
512
_tokens_ (usually individual words), and outputs a stream of _tokens_. For
613
instance, a <<analysis-whitespace-tokenizer,`whitespace`>> tokenizer breaks
Lines changed: 62 additions & 0 deletions
Loading
Lines changed: 68 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,68 @@
1+
[[full-text-search]]
2+
== Full-text search
3+
4+
.Hands-on introduction to full-text search
5+
[TIP]
6+
====
7+
Would you prefer to jump straight into a hands-on tutorial?
8+
Refer to our quick start <<full-text-filter-tutorial,full-text search tutorial>>.
9+
====
10+
11+
Full-text search, also known as lexical search, is a technique for fast, efficient searching through text fields in documents.
12+
Documents and search queries are transformed to enable returning https://www.elastic.co/what-is/search-relevance[relevant] results instead of simply exact term matches.
13+
Fields of type <<text-field-type,`text`>> are analyzed and indexed for full-text search.
14+
15+
Built on decades of information retrieval research, full-text search in {es} is a compute-efficient, deterministic approach that scales predictably with data volume.
16+
Full-text search is the cornerstone of production-grade search solutions.
17+
Combine full-text search with <<semantic-search,semantic search using vectors>> to build modern hybrid search applications.
18+
19+
[discrete]
20+
[[full-text-search-how-it-works]]
21+
=== How full-text search works
22+
23+
The following diagram illustrates the components of full-text search. Note that the query text also undergoes text analysis, so that it's transformed in the same way as the indexed text.
24+
25+
image::images/search/full-text-search-overview.svg[Components of full-text search from analysis to relevance scoring, align=center, width=500]
26+
27+
At a high level, full-text search involves the following:
28+
29+
* <<analysis-overview,*Text analysis*>>: Analysis consists of a pipeline of sequential transformations. Text is transformed into a format optimized for searching by stemming, lowercasing, stop word elimination, etc. {es} contains a number of built-in <<analysis-analyzers,analyzers>> (including language-specific analyzers) and tokenizers, and you can also create custom analyzers.
30+
+
31+
[TIP]
32+
====
33+
Refer to <<test-analyzer,Test an analyzer>> to learn how to test an analyzer and inspect the tokens and metadata it generates.
34+
====
35+
* *Inverted index*: After analysis is complete, {es} builds an inverted index from the resulting tokens.
36+
An inverted index is a data structure that maps each token to the documents that contain it.
37+
It's made up of two key components:
38+
** *Dictionary*: A sorted list of all unique terms in the collection of documents in your index.
39+
** *Posting list*: For each term, a list of document IDs where the term appears, along with optional metadata like term frequency and position.
40+
* *Relevance scoring*: Results are ranked by how relevant they are to the given query. The relevance score of each document is represented by a positive floating-point number called the `_score`. The higher the `_score`, the more relevant the document.
41+
+
42+
The default <<index-modules-similarity,similarity algorithm>> {es} uses for calculating relevance scores is https://en.wikipedia.org/wiki/Okapi_BM25[Okapi BM25], a variation of the https://en.wikipedia.org/wiki/Tf–idf[TF-IDF algorithm]. BM25 calculates relevance scores based on term frequency, document frequency, and document length.
43+
Refer to this https://www.elastic.co/blog/practical-bm25-part-2-the-bm25-algorithm-and-its-variables[technical blog post] for a deep dive into BM25.
44+
* *Full-text search query*: Query text is analyzed <<analysis-index-search-time,the same way as the indexed text>>, and the resulting tokens are used to search the inverted index.
45+
+
46+
Query DSL supports a number of <<full-text-queries,full-text queries>>.
47+
+
48+
As of 8.17, {esql} also supports <<esql-search-functions,full-text search>> functions.
49+
50+
[discrete]
51+
[[full-text-search-learn-more]]
52+
=== Learn more
53+
54+
.Getting Started
55+
* <<full-text-filter-tutorial,Hands-on full-text search tutorial>>
56+
57+
.Core Concepts
58+
* <<text,Text fields>>
59+
* <<analysis,Text analysis>>
60+
* <<analysis-tokenizers,Tokenizers>>
61+
* <<analysis-analyzers,Analyzers>>
62+
63+
.Search APIs
64+
* <<full-text-queries,Full-text queries using Query DSL>>
65+
* <<esql-search-functions,Full-text search functions in {esql}>>
66+
67+
.Advanced Topics
68+
* https://www.elastic.co/blog/practical-bm25-part-2-the-bm25-algorithm-and-its-variables[Practical BM25: Part 2 - The BM25 Algorithm and its Variables]

docs/reference/search/search-your-data/search-your-data.asciidoc

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,9 @@ DSL, with a simplified user experience. Create search applications based on your
4242
{es} indices, build queries using search templates, and easily preview your
4343
results directly in the Kibana Search UI.
4444

45+
include
4546
include::search-api.asciidoc[]
47+
include::full-text-search.asciidoc[]
4648
include::../../how-to/recipes.asciidoc[]
4749
// ☝️ search relevance recipes
4850
include::retrievers-overview.asciidoc[]

0 commit comments

Comments
 (0)