Skip to content
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 4 additions & 3 deletions docs/reference/analysis/tokenizers.asciidoc
Original file line number Diff line number Diff line change
@@ -1,11 +1,12 @@
[[analysis-tokenizers]]
== Tokenizer reference

.Difference between {es} tokenization and neural tokenization
[NOTE]
====
{es}'s text analysis produces meaningful _linguistic_ tokens (like words and phrases) optimized for search relevance scoring.
This differs from neural tokenizers, which break text into smaller subword units and numerical vectors for machine learning models.
For example, "searching" becomes the searchable word token "search" in {es}, while a neural tokenizer might split it into ["sea", "##rch", "##ing"] for model consumption.
{es}'s tokenization process produces linguistic tokens, optimized for search and retrieval.
This differs from neural tokenization in the context of machine learning and natural language processing. Neural tokenizers translate strings into smaller, subword tokens, which are encoded into vectors for consumptions by neural networks.
{es} does not have built-in neural tokenizers.
====

A _tokenizer_ receives a stream of characters, breaks it up into individual
Expand Down
75 changes: 47 additions & 28 deletions docs/reference/images/search/full-text-search-overview.svg
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this diagram is very helpful, but it needs to be polished up so the text placement is more consistent / there's consistent padding in the cells. we could prob leverage the figma auto-layout tools for this.

We could also consider paring back colors that don't add a lot of meaning - I'd suggest doing greyscale for most of these and then maybe using a different shape for search results

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯

Not working in Figma because I'm visually illiterate but will try to fix those color/layout issues

Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
<titleabbrev>Basics: Full-text search and filtering</titleabbrev>
++++

This is a hands-on introduction to the basics of full-text search with {es}, also known as _lexical search_, using the <<search-search,`_search` API>> and <<query-dsl,Query DSL>>.
This is a hands-on introduction to the basics of <<full-text-search,full-text search>> with {es}, also known as _lexical search_, using the <<search-search,`_search` API>> and <<query-dsl,Query DSL>>.
You'll also learn how to filter data, to narrow down search results based on exact criteria.

In this scenario, we're implementing a search function for a cooking blog.
Expand Down Expand Up @@ -632,6 +632,7 @@ This tutorial introduced the basics of full-text search and filtering in {es}.
Building a real-world search experience requires understanding many more advanced concepts and techniques.
Here are some resources once you're ready to dive deeper:

* <<full-text-search, Full-text search>>: Learn about the core components of full-text search in {es}.
* <<search-analyze, Elasticsearch basics — Search and analyze data>>: Understand all your options for searching and analyzing data in {es}.
* <<analysis,Text analysis>>: Understand how text is processed for full-text search.
* <<search-with-elasticsearch>>: Learn about more advanced search techniques using the `_search` API, including semantic search.
Expand Down
45 changes: 30 additions & 15 deletions docs/reference/search/search-your-data/full-text-search.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -12,27 +12,29 @@ Full-text search, also known as lexical search, is a technique for fast, efficie
Documents and search queries are transformed to enable returning https://www.elastic.co/what-is/search-relevance[relevant] results instead of simply exact term matches.
Fields of type <<text-field-type,`text`>> are analyzed and indexed for full-text search.

Built on decades of information retrieval research, full-text search in {es} is a compute-efficient, deterministic approach that scales predictably with data volume.
Full-text search is the cornerstone of production-grade search solutions.
Combine full-text search with <<semantic-search,semantic search using vectors>> to build modern hybrid search applications.
Built on decades of information retrieval research, full-text search delivers reliable results that scale predictably as your data grows. Because it runs efficiently on CPUs, {es}'s full-text search requires minimal computational resources compared to GPU-intensive vector operations.

This translates to lower infrastructure costs and predictable scaling requirements. You can scale horizontally by adding more nodes with standard CPU cores and RAM - no specialized hardware needed. A typical deployment will start with 2-3 nodes and grow incrementally as search volume increases. Learn more about <<scalability, moving to production>>.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a bit of a red herring in this doc. I'd just remove the whole paragraph. it also sends the wrong signals to people on serverless who use ft search (the paragraph immediately before it also has references to hardware but I'm less concerned about it because it mostly just sells that this is a performant design)

when I mentioned prod in this context, I mostly meant the idea of making these calls from an app or site (this comment likely also a red herring)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmm yes good point about serverless and the basic message is clear in preceding paragraph anyways


You can combine full-text search with <<semantic-search,semantic search using vectors>> to build modern hybrid search applications. While vector search may require additional GPU resources, the full-text component remains cost-effective by leveraging existing CPU infrastructure.

[discrete]
[[full-text-search-how-it-works]]
=== How full-text search works

The following diagram illustrates the components of full-text search. Note that the query text also undergoes text analysis, so that it's transformed in the same way as the indexed text.
The following diagram illustrates the components of full-text search.

image::images/search/full-text-search-overview.svg[Components of full-text search from analysis to relevance scoring, align=center, width=500]

At a high level, full-text search involves the following:

* <<analysis-overview,*Text analysis*>>: Analysis consists of a pipeline of sequential transformations. Text is transformed into a format optimized for searching by stemming, lowercasing, stop word elimination, etc. {es} contains a number of built-in <<analysis-analyzers,analyzers>> (including language-specific analyzers) and tokenizers, and you can also create custom analyzers.
* <<analysis-overview,*Text analysis*>>: Analysis consists of a pipeline of sequential transformations. Text is transformed into a format optimized for searching using techniques such as stemming, lowercasing, and stop word elimination. {es} contains a number of built-in <<analysis-analyzers,analyzers>> and tokenizers, including options to analyze specific language text. You can also create custom analyzers.
+
[TIP]
====
Refer to <<test-analyzer,Test an analyzer>> to learn how to test an analyzer and inspect the tokens and metadata it generates.
====
* *Inverted index*: After analysis is complete, {es} builds an inverted index from the resulting tokens.
* *Inverted index creation*: After analysis is complete, {es} builds an inverted index from the resulting tokens.
An inverted index is a data structure that maps each token to the documents that contain it.
It's made up of two key components:
** *Dictionary*: A sorted list of all unique terms in the collection of documents in your index.
Expand All @@ -47,23 +49,36 @@ Query DSL supports a number of <<full-text-queries,full-text queries>>.
+
As of 8.17, {esql} also supports <<esql-search-functions,full-text search>> functions.

[discrete]
[[full-text-search-getting-started]]
=== Getting started

For a hands-on introduction to full-text search, refer to the <<full-text-filter-tutorial,full-text search tutorial>>.

[discrete]
[[full-text-search-learn-more]]
=== Learn more
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this section works as a v1 but it might be nice to guide people through what resources we want them to check out next, or help them to understand the context of a topic (e.g. "To learn how to optimize the relevance of your search results, refer to <<Search relevance optimizations>>")

would also consider pulling out the "get started" into its own CTA - it's the most important thing people should be looking at next. I'm also curious to know if there's a resource we can provide to move this into a prod world (guess that would be explained in our references to API clients)

Copy link
Contributor Author

@leemthompo leemthompo Jan 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding some more context.

I hinted at prod world in the intro paragraph revision— to concretize the compute efficiency wording, with link to moving to prod section.


.Getting Started
* <<full-text-filter-tutorial,Hands-on full-text search tutorial>>
Here are some resources to help you learn more about full-text search with {es}.

*Core concepts*

Learn about the core components of full-text search:

.Core Concepts
* <<text,Text fields>>
* <<analysis,Text analysis>>
* <<analysis-tokenizers,Tokenizers>>
* <<analysis-analyzers,Analyzers>>
** <<analysis-tokenizers,Tokenizers>>
** <<analysis-analyzers,Analyzers>>

*{es} query languages*

Learn how to build full-text search queries using {es}'s query languages:

.Search APIs
* <<full-text-queries,Full-text queries using Query DSL>>
* <<esql-search-functions,Full-text search functions in {esql}>>

.Advanced Topics
* https://www.elastic.co/blog/practical-bm25-part-2-the-bm25-algorithm-and-its-variables[Practical BM25: Part 2 - The BM25 Algorithm and its Variables]
* <<recipes,Search relevance optimization recipes>>
*Advanced topics*

For a technical deep dive into {es}'s BM25 implementation read this blog post: https://www.elastic.co/blog/practical-bm25-part-2-the-bm25-algorithm-and-its-variables[The BM25 Algorithm and its Variables].

To learn how to optimize the relevance of your search results, refer to <<recipes,Search relevance optimizations>>.