-
Notifications
You must be signed in to change notification settings - Fork 25.6k
[DOCS] Add full-text search overview #119462
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
0ea93d2
f38e6bc
12c9de0
1971c5e
3725cb0
3184e2f
d2adc5d
bd4d131
2f5152c
c334a3b
9eb6984
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,82 @@ | ||
[[full-text-search]] | ||
== Full-text search | ||
|
||
.Hands-on introduction to full-text search | ||
[TIP] | ||
==== | ||
Would you prefer to jump straight into a hands-on tutorial? | ||
Refer to our quick start <<full-text-filter-tutorial,full-text search tutorial>>. | ||
==== | ||
|
||
Full-text search, also known as lexical search, is a technique for fast, efficient searching through text fields in documents. | ||
Documents and search queries are transformed to enable returning https://www.elastic.co/what-is/search-relevance[relevant] results instead of simply exact term matches. | ||
Fields of type <<text-field-type,`text`>> are analyzed and indexed for full-text search. | ||
|
||
Built on decades of information retrieval research, full-text search delivers reliable results that scale predictably as your data grows. Because it runs efficiently on CPUs, {es}'s full-text search requires minimal computational resources compared to GPU-intensive vector operations. | ||
|
||
You can combine full-text search with <<semantic-search,semantic search using vectors>> to build modern hybrid search applications. While vector search may require additional GPU resources, the full-text component remains cost-effective by leveraging existing CPU infrastructure. | ||
|
||
[discrete] | ||
[[full-text-search-how-it-works]] | ||
=== How full-text search works | ||
|
||
The following diagram illustrates the components of full-text search. | ||
|
||
image::images/search/full-text-search-overview.svg[Components of full-text search from analysis to relevance scoring, align=center, width=500] | ||
|
||
At a high level, full-text search involves the following: | ||
|
||
* <<analysis-overview,*Text analysis*>>: Analysis consists of a pipeline of sequential transformations. Text is transformed into a format optimized for searching using techniques such as stemming, lowercasing, and stop word elimination. {es} contains a number of built-in <<analysis-analyzers,analyzers>> and tokenizers, including options to analyze specific language text. You can also create custom analyzers. | ||
+ | ||
[TIP] | ||
==== | ||
Refer to <<test-analyzer,Test an analyzer>> to learn how to test an analyzer and inspect the tokens and metadata it generates. | ||
==== | ||
* *Inverted index creation*: After analysis is complete, {es} builds an inverted index from the resulting tokens. | ||
An inverted index is a data structure that maps each token to the documents that contain it. | ||
It's made up of two key components: | ||
** *Dictionary*: A sorted list of all unique terms in the collection of documents in your index. | ||
** *Posting list*: For each term, a list of document IDs where the term appears, along with optional metadata like term frequency and position. | ||
* *Relevance scoring*: Results are ranked by how relevant they are to the given query. The relevance score of each document is represented by a positive floating-point number called the `_score`. The higher the `_score`, the more relevant the document. | ||
+ | ||
The default <<index-modules-similarity,similarity algorithm>> {es} uses for calculating relevance scores is https://en.wikipedia.org/wiki/Okapi_BM25[Okapi BM25], a variation of the https://en.wikipedia.org/wiki/Tf–idf[TF-IDF algorithm]. BM25 calculates relevance scores based on term frequency, document frequency, and document length. | ||
Refer to this https://www.elastic.co/blog/practical-bm25-part-2-the-bm25-algorithm-and-its-variables[technical blog post] for a deep dive into BM25. | ||
* *Full-text search query*: Query text is analyzed <<analysis-index-search-time,the same way as the indexed text>>, and the resulting tokens are used to search the inverted index. | ||
+ | ||
Query DSL supports a number of <<full-text-queries,full-text queries>>. | ||
+ | ||
As of 8.17, {esql} also supports <<esql-search-functions,full-text search>> functions. | ||
|
||
[discrete] | ||
[[full-text-search-getting-started]] | ||
=== Getting started | ||
|
||
For a hands-on introduction to full-text search, refer to the <<full-text-filter-tutorial,full-text search tutorial>>. | ||
|
||
[discrete] | ||
[[full-text-search-learn-more]] | ||
=== Learn more | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this section works as a v1 but it might be nice to guide people through what resources we want them to check out next, or help them to understand the context of a topic (e.g. "To learn how to optimize the relevance of your search results, refer to would also consider pulling out the "get started" into its own CTA - it's the most important thing people should be looking at next. I'm also curious to know if there's a resource we can provide to move this into a prod world (guess that would be explained in our references to API clients) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Adding some more context. I hinted at prod world in the intro paragraph revision— to concretize the compute efficiency wording, with link to moving to prod section. |
||
|
||
Here are some resources to help you learn more about full-text search with {es}. | ||
|
||
*Core concepts* | ||
|
||
Learn about the core components of full-text search: | ||
|
||
* <<text,Text fields>> | ||
* <<analysis,Text analysis>> | ||
** <<analysis-tokenizers,Tokenizers>> | ||
** <<analysis-analyzers,Analyzers>> | ||
|
||
*{es} query languages* | ||
|
||
Learn how to build full-text search queries using {es}'s query languages: | ||
|
||
* <<full-text-queries,Full-text queries using Query DSL>> | ||
* <<esql-search-functions,Full-text search functions in {esql}>> | ||
|
||
*Advanced topics* | ||
|
||
For a technical deep dive into {es}'s BM25 implementation read this blog post: https://www.elastic.co/blog/practical-bm25-part-2-the-bm25-algorithm-and-its-variables[The BM25 Algorithm and its Variables]. | ||
|
||
To learn how to optimize the relevance of your search results, refer to <<recipes,Search relevance optimizations>>. |
Uh oh!
There was an error while loading. Please reload this page.