Skip to content
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 1 addition & 2 deletions docs/reference/analysis.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,7 @@
--

_Text analysis_ is the process of converting unstructured text, like
the body of an email or a product description, into a structured format that's
optimized for search.
the body of an email or a product description, into a structured format that's <<full-text-search,optimized for search>>.

[discrete]
[[when-to-configure-analysis]]
Expand Down
7 changes: 7 additions & 0 deletions docs/reference/analysis/tokenizers.asciidoc
Original file line number Diff line number Diff line change
@@ -1,6 +1,13 @@
[[analysis-tokenizers]]
== Tokenizer reference

[NOTE]
====
{es}'s text analysis produces meaningful _linguistic_ tokens (like words and phrases) optimized for search relevance scoring.
This differs from neural tokenizers, which break text into smaller subword units and numerical vectors for machine learning models.
For example, "searching" becomes the searchable word token "search" in {es}, while a neural tokenizer might split it into ["sea", "##rch", "##ing"] for model consumption.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these ## rendered into a highlight. not sure what your intent was here but you might have to escape the chars

image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to use backticks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing the example as unnecessary detail

====

A _tokenizer_ receives a stream of characters, breaks it up into individual
_tokens_ (usually individual words), and outputs a stream of _tokens_. For
instance, a <<analysis-whitespace-tokenizer,`whitespace`>> tokenizer breaks
Expand Down
62 changes: 62 additions & 0 deletions docs/reference/images/search/full-text-search-overview.svg
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this diagram is very helpful, but it needs to be polished up so the text placement is more consistent / there's consistent padding in the cells. we could prob leverage the figma auto-layout tools for this.

We could also consider paring back colors that don't add a lot of meaning - I'd suggest doing greyscale for most of these and then maybe using a different shape for search results

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯

Not working in Figma because I'm visually illiterate but will try to fix those color/layout issues

Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
7 changes: 3 additions & 4 deletions docs/reference/intro.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -260,7 +260,7 @@ Refer to <<getting-started,first steps with Elasticsearch>> for a hands-on examp

*{esql}* is a new piped query language and compute engine which was first added in version *8.11*.

{esql} does not yet support all the features of Query DSL, like full-text search and semantic search.
{esql} does not yet support all the features of Query DSL.
Look forward to new {esql} features and functionalities in each release.

Refer to <<search-analyze-query-languages>> for a full overview of the query languages available in {es}.
Expand All @@ -280,7 +280,7 @@ The <<search-your-data, `_search` endpoint>> accepts queries written in Query DS

Query DSL support a wide range of search techniques, including the following:

* <<full-text-queries,*Full-text search*>>: Search text that has been analyzed and indexed to support phrase or proximity queries, fuzzy matches, and more.
* <<full-text-search,*Full-text search*>>: Search text that has been analyzed and indexed to support phrase or proximity queries, fuzzy matches, and more.
* <<keyword,*Keyword search*>>: Search for exact matches using `keyword` fields.
* <<semantic-search-semantic-text,*Semantic search*>>: Search `semantic_text` fields using dense or sparse vector search on embeddings generated in your {es} cluster.
* <<knn-search,*Vector search*>>: Search for similar dense vectors using the kNN algorithm for embeddings generated outside of {es}.
Expand Down Expand Up @@ -328,8 +328,7 @@ directly executed within {es} itself.

The <<esql-rest,`_query` endpoint>> accepts queries written in {esql} syntax.

Today, it supports a subset of the features available in Query DSL, like aggregations, filters, and transformations.
It does not yet support full-text search or semantic search.
Today, it supports a subset of the features available in Query DSL, but it is rapidly evolving.

It comes with a comprehensive set of <<esql-functions-operators,functions and operators>> for working with data and has robust integration with {kib}'s Discover, dashboards and visualizations.

Expand Down
69 changes: 69 additions & 0 deletions docs/reference/search/search-your-data/full-text-search.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
[[full-text-search]]
== Full-text search

.Hands-on introduction to full-text search
[TIP]
====
Would you prefer to jump straight into a hands-on tutorial?
Refer to our quick start <<full-text-filter-tutorial,full-text search tutorial>>.
====

Full-text search, also known as lexical search, is a technique for fast, efficient searching through text fields in documents.
Documents and search queries are transformed to enable returning https://www.elastic.co/what-is/search-relevance[relevant] results instead of simply exact term matches.
Fields of type <<text-field-type,`text`>> are analyzed and indexed for full-text search.

Built on decades of information retrieval research, full-text search in {es} is a compute-efficient, deterministic approach that scales predictably with data volume.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sentence is pretty dense - deterministic is doing a lot of heavy lifting here. Can we be more explicit about the benefits, or alternatively, weigh the value of this sentence to the reader?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree, rewording/reshaping

Full-text search is the cornerstone of production-grade search solutions.
Combine full-text search with <<semantic-search,semantic search using vectors>> to build modern hybrid search applications.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Combine full-text search with <<semantic-search,semantic search using vectors>> to build modern hybrid search applications.
You can combine full-text search with <<semantic-search,semantic search using vectors>> to build modern hybrid search applications.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was an ORDER!


[discrete]
[[full-text-search-how-it-works]]
=== How full-text search works

The following diagram illustrates the components of full-text search. Note that the query text also undergoes text analysis, so that it's transformed in the same way as the indexed text.

image::images/search/full-text-search-overview.svg[Components of full-text search from analysis to relevance scoring, align=center, width=500]

At a high level, full-text search involves the following:

* <<analysis-overview,*Text analysis*>>: Analysis consists of a pipeline of sequential transformations. Text is transformed into a format optimized for searching by stemming, lowercasing, stop word elimination, etc. {es} contains a number of built-in <<analysis-analyzers,analyzers>> (including language-specific analyzers) and tokenizers, and you can also create custom analyzers.
+
[TIP]
====
Refer to <<test-analyzer,Test an analyzer>> to learn how to test an analyzer and inspect the tokens and metadata it generates.
====
* *Inverted index*: After analysis is complete, {es} builds an inverted index from the resulting tokens.
An inverted index is a data structure that maps each token to the documents that contain it.
It's made up of two key components:
** *Dictionary*: A sorted list of all unique terms in the collection of documents in your index.
** *Posting list*: For each term, a list of document IDs where the term appears, along with optional metadata like term frequency and position.
* *Relevance scoring*: Results are ranked by how relevant they are to the given query. The relevance score of each document is represented by a positive floating-point number called the `_score`. The higher the `_score`, the more relevant the document.
+
The default <<index-modules-similarity,similarity algorithm>> {es} uses for calculating relevance scores is https://en.wikipedia.org/wiki/Okapi_BM25[Okapi BM25], a variation of the https://en.wikipedia.org/wiki/Tf–idf[TF-IDF algorithm]. BM25 calculates relevance scores based on term frequency, document frequency, and document length.
Refer to this https://www.elastic.co/blog/practical-bm25-part-2-the-bm25-algorithm-and-its-variables[technical blog post] for a deep dive into BM25.
* *Full-text search query*: Query text is analyzed <<analysis-index-search-time,the same way as the indexed text>>, and the resulting tokens are used to search the inverted index.
+
Query DSL supports a number of <<full-text-queries,full-text queries>>.
+
As of 8.17, {esql} also supports <<esql-search-functions,full-text search>> functions.

[discrete]
[[full-text-search-learn-more]]
=== Learn more
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this section works as a v1 but it might be nice to guide people through what resources we want them to check out next, or help them to understand the context of a topic (e.g. "To learn how to optimize the relevance of your search results, refer to <<Search relevance optimizations>>")

would also consider pulling out the "get started" into its own CTA - it's the most important thing people should be looking at next. I'm also curious to know if there's a resource we can provide to move this into a prod world (guess that would be explained in our references to API clients)

Copy link
Contributor Author

@leemthompo leemthompo Jan 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding some more context.

I hinted at prod world in the intro paragraph revision— to concretize the compute efficiency wording, with link to moving to prod section.


.Getting Started
* <<full-text-filter-tutorial,Hands-on full-text search tutorial>>

.Core Concepts
* <<text,Text fields>>
* <<analysis,Text analysis>>
* <<analysis-tokenizers,Tokenizers>>
* <<analysis-analyzers,Analyzers>>

.Search APIs
* <<full-text-queries,Full-text queries using Query DSL>>
* <<esql-search-functions,Full-text search functions in {esql}>>

.Advanced Topics
* https://www.elastic.co/blog/practical-bm25-part-2-the-bm25-algorithm-and-its-variables[Practical BM25: Part 2 - The BM25 Algorithm and its Variables]
* <<recipes,Search relevance optimization recipes>>
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ Search for exact values::
Search for <<term-level-queries,exact values or ranges>> of numbers, dates, IPs,
or strings.

Full-text search::
<<full-text-search,Full-text search>>::
Use <<full-text-queries,full text queries>> to query <<analysis,unstructured
textual data>> and find documents that best match query terms.

Expand All @@ -43,6 +43,7 @@ DSL, with a simplified user experience. Create search applications based on your
results directly in the Kibana Search UI.

include::search-api.asciidoc[]
include::full-text-search.asciidoc[]
include::../../how-to/recipes.asciidoc[]
// ☝️ search relevance recipes
include::retrievers-overview.asciidoc[]
Expand Down