|
| 1 | +[[full-text-search]] |
| 2 | +== Full-text search |
| 3 | + |
| 4 | +.Hands-on introduction to full-text search |
| 5 | +[TIP] |
| 6 | +==== |
| 7 | +Would you prefer to jump straight into a hands-on tutorial? |
| 8 | +Refer to our quick start <<full-text-filter-tutorial,full-text search tutorial>>. |
| 9 | +==== |
| 10 | + |
| 11 | +Full-text search, also known as lexical search, is a technique for fast, efficient searching through text fields in documents. |
| 12 | +Documents and search queries are transformed to enable returning https://www.elastic.co/what-is/search-relevance[relevant] results instead of simply exact term matches. |
| 13 | +Fields of type <<text-field-type,`text`>> are analyzed and indexed for full-text search. |
| 14 | + |
| 15 | +Built on decades of information retrieval research, full-text search in {es} is a compute-efficient, deterministic approach that scales predictably with data volume. |
| 16 | +Full-text search is the cornerstone of production-grade search solutions. |
| 17 | +Combine full-text search with <<semantic-search,semantic search using vectors>> to build modern hybrid search applications. |
| 18 | + |
| 19 | +[discrete] |
| 20 | +[[full-text-search-how-it-works]] |
| 21 | +=== How full-text search works |
| 22 | + |
| 23 | +The following diagram illustrates the components of full-text search. Note that the query text also undergoes text analysis, so that it's transformed in the same way as the indexed text. |
| 24 | + |
| 25 | +image::images/search/full-text-search-overview.svg[Components of full-text search from analysis to relevance scoring, align=center, width=500] |
| 26 | + |
| 27 | +At a high level, full-text search involves the following: |
| 28 | + |
| 29 | +* <<analysis-overview,*Text analysis*>>: Analysis consists of a pipeline of sequential transformations. Text is transformed into a format optimized for searching by stemming, lowercasing, stop word elimination, etc. {es} contains a number of built-in <<analysis-analyzers,analyzers>> (including language-specific analyzers) and tokenizers, and you can also create custom analyzers. |
| 30 | ++ |
| 31 | +[TIP] |
| 32 | +==== |
| 33 | +Refer to <<test-analyzer,Test an analyzer>> to learn how to test an analyzer and inspect the tokens and metadata it generates. |
| 34 | +==== |
| 35 | +* *Inverted index*: After analysis is complete, {es} builds an inverted index from the resulting tokens. |
| 36 | +An inverted index is a data structure that maps each token to the documents that contain it. |
| 37 | +It's made up of two key components: |
| 38 | +** *Dictionary*: A sorted list of all unique terms in the collection of documents in your index. |
| 39 | +** *Posting list*: For each term, a list of document IDs where the term appears, along with optional metadata like term frequency and position. |
| 40 | +* *Relevance scoring*: Results are ranked by how relevant they are to the given query. The relevance score of each document is represented by a positive floating-point number called the `_score`. The higher the `_score`, the more relevant the document. |
| 41 | ++ |
| 42 | +The default <<index-modules-similarity,similarity algorithm>> {es} uses for calculating relevance scores is https://en.wikipedia.org/wiki/Okapi_BM25[Okapi BM25], a variation of the https://en.wikipedia.org/wiki/Tf–idf[TF-IDF algorithm]. BM25 calculates relevance scores based on term frequency, document frequency, and document length. |
| 43 | +Refer to this https://www.elastic.co/blog/practical-bm25-part-2-the-bm25-algorithm-and-its-variables[technical blog post] for a deep dive into BM25. |
| 44 | +* *Full-text search query*: Query text is analyzed <<analysis-index-search-time,the same way as the indexed text>>, and the resulting tokens are used to search the inverted index. |
| 45 | ++ |
| 46 | +Query DSL supports a number of <<full-text-queries,full-text queries>>. |
| 47 | ++ |
| 48 | +As of 8.17, {esql} also supports <<esql-search-functions,full-text search>> functions. |
| 49 | + |
| 50 | +[discrete] |
| 51 | +[[full-text-search-learn-more]] |
| 52 | +=== Learn more |
| 53 | + |
| 54 | +.Getting Started |
| 55 | +* <<full-text-filter-tutorial,Hands-on full-text search tutorial>> |
| 56 | + |
| 57 | +.Core Concepts |
| 58 | +* <<text,Text fields>> |
| 59 | +* <<analysis,Text analysis>> |
| 60 | +* <<analysis-tokenizers,Tokenizers>> |
| 61 | +* <<analysis-analyzers,Analyzers>> |
| 62 | + |
| 63 | +.Search APIs |
| 64 | +* <<full-text-queries,Full-text queries using Query DSL>> |
| 65 | +* <<esql-search-functions,Full-text search functions in {esql}>> |
| 66 | + |
| 67 | +.Advanced Topics |
| 68 | +* https://www.elastic.co/blog/practical-bm25-part-2-the-bm25-algorithm-and-its-variables[Practical BM25: Part 2 - The BM25 Algorithm and its Variables] |
0 commit comments