|
| 1 | +[[full-text-search]] |
| 2 | +== Full-text search |
| 3 | + |
| 4 | +.Hands-on introduction to full-text search |
| 5 | +[TIP] |
| 6 | +==== |
| 7 | +Would you prefer to jump straight into a hands-on tutorial? |
| 8 | +Refer to our quick start <<full-text-filter-tutorial,full-text search tutorial>>. |
| 9 | +==== |
| 10 | + |
| 11 | +Full-text search, also known as lexical search, is a technique for fast, efficient searching through text fields in documents. |
| 12 | +Documents and search queries are transformed to enable returning https://www.elastic.co/what-is/search-relevance[relevant] results instead of simply exact term matches. |
| 13 | +Fields of type <<text-field-type,`text`>> are analyzed and indexed for full-text search. |
| 14 | + |
| 15 | +Built on decades of information retrieval research, full-text search delivers reliable results that scale predictably as your data grows. Because it runs efficiently on CPUs, {es}'s full-text search requires minimal computational resources compared to GPU-intensive vector operations. |
| 16 | + |
| 17 | +You can combine full-text search with <<semantic-search,semantic search using vectors>> to build modern hybrid search applications. While vector search may require additional GPU resources, the full-text component remains cost-effective by leveraging existing CPU infrastructure. |
| 18 | + |
| 19 | +[discrete] |
| 20 | +[[full-text-search-how-it-works]] |
| 21 | +=== How full-text search works |
| 22 | + |
| 23 | +The following diagram illustrates the components of full-text search. |
| 24 | + |
| 25 | +image::images/search/full-text-search-overview.svg[Components of full-text search from analysis to relevance scoring, align=center, width=500] |
| 26 | + |
| 27 | +At a high level, full-text search involves the following: |
| 28 | + |
| 29 | +* <<analysis-overview,*Text analysis*>>: Analysis consists of a pipeline of sequential transformations. Text is transformed into a format optimized for searching using techniques such as stemming, lowercasing, and stop word elimination. {es} contains a number of built-in <<analysis-analyzers,analyzers>> and tokenizers, including options to analyze specific language text. You can also create custom analyzers. |
| 30 | ++ |
| 31 | +[TIP] |
| 32 | +==== |
| 33 | +Refer to <<test-analyzer,Test an analyzer>> to learn how to test an analyzer and inspect the tokens and metadata it generates. |
| 34 | +==== |
| 35 | +* *Inverted index creation*: After analysis is complete, {es} builds an inverted index from the resulting tokens. |
| 36 | +An inverted index is a data structure that maps each token to the documents that contain it. |
| 37 | +It's made up of two key components: |
| 38 | +** *Dictionary*: A sorted list of all unique terms in the collection of documents in your index. |
| 39 | +** *Posting list*: For each term, a list of document IDs where the term appears, along with optional metadata like term frequency and position. |
| 40 | +* *Relevance scoring*: Results are ranked by how relevant they are to the given query. The relevance score of each document is represented by a positive floating-point number called the `_score`. The higher the `_score`, the more relevant the document. |
| 41 | ++ |
| 42 | +The default <<index-modules-similarity,similarity algorithm>> {es} uses for calculating relevance scores is https://en.wikipedia.org/wiki/Okapi_BM25[Okapi BM25], a variation of the https://en.wikipedia.org/wiki/Tf–idf[TF-IDF algorithm]. BM25 calculates relevance scores based on term frequency, document frequency, and document length. |
| 43 | +Refer to this https://www.elastic.co/blog/practical-bm25-part-2-the-bm25-algorithm-and-its-variables[technical blog post] for a deep dive into BM25. |
| 44 | +* *Full-text search query*: Query text is analyzed <<analysis-index-search-time,the same way as the indexed text>>, and the resulting tokens are used to search the inverted index. |
| 45 | ++ |
| 46 | +Query DSL supports a number of <<full-text-queries,full-text queries>>. |
| 47 | ++ |
| 48 | +As of 8.17, {esql} also supports <<esql-search-functions,full-text search>> functions. |
| 49 | + |
| 50 | +[discrete] |
| 51 | +[[full-text-search-getting-started]] |
| 52 | +=== Getting started |
| 53 | + |
| 54 | +For a hands-on introduction to full-text search, refer to the <<full-text-filter-tutorial,full-text search tutorial>>. |
| 55 | + |
| 56 | +[discrete] |
| 57 | +[[full-text-search-learn-more]] |
| 58 | +=== Learn more |
| 59 | + |
| 60 | +Here are some resources to help you learn more about full-text search with {es}. |
| 61 | + |
| 62 | +*Core concepts* |
| 63 | + |
| 64 | +Learn about the core components of full-text search: |
| 65 | + |
| 66 | +* <<text,Text fields>> |
| 67 | +* <<analysis,Text analysis>> |
| 68 | +** <<analysis-tokenizers,Tokenizers>> |
| 69 | +** <<analysis-analyzers,Analyzers>> |
| 70 | + |
| 71 | +*{es} query languages* |
| 72 | + |
| 73 | +Learn how to build full-text search queries using {es}'s query languages: |
| 74 | + |
| 75 | +* <<full-text-queries,Full-text queries using Query DSL>> |
| 76 | +* <<esql-search-functions,Full-text search functions in {esql}>> |
| 77 | + |
| 78 | +*Advanced topics* |
| 79 | + |
| 80 | +For a technical deep dive into {es}'s BM25 implementation read this blog post: https://www.elastic.co/blog/practical-bm25-part-2-the-bm25-algorithm-and-its-variables[The BM25 Algorithm and its Variables]. |
| 81 | + |
| 82 | +To learn how to optimize the relevance of your search results, refer to <<recipes,Search relevance optimizations>>. |
0 commit comments