-
Notifications
You must be signed in to change notification settings - Fork 25.6k
[DOCS] Add full-text search overview #119462
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 5 commits
0ea93d2
f38e6bc
12c9de0
1971c5e
3725cb0
3184e2f
d2adc5d
bd4d131
2f5152c
c334a3b
9eb6984
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,13 @@ | ||
[[analysis-tokenizers]] | ||
== Tokenizer reference | ||
|
||
[NOTE] | ||
==== | ||
{es}'s text analysis produces meaningful _linguistic_ tokens (like words and phrases) optimized for search relevance scoring. | ||
leemthompo marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
This differs from neural tokenizers, which break text into smaller subword units and numerical vectors for machine learning models. | ||
leemthompo marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
For example, "searching" becomes the searchable word token "search" in {es}, while a neural tokenizer might split it into ["sea", "##rch", "##ing"] for model consumption. | ||
|
||
==== | ||
|
||
A _tokenizer_ receives a stream of characters, breaks it up into individual | ||
_tokens_ (usually individual words), and outputs a stream of _tokens_. For | ||
instance, a <<analysis-whitespace-tokenizer,`whitespace`>> tokenizer breaks | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this diagram is very helpful, but it needs to be polished up so the text placement is more consistent / there's consistent padding in the cells. we could prob leverage the figma auto-layout tools for this. We could also consider paring back colors that don't add a lot of meaning - I'd suggest doing greyscale for most of these and then maybe using a different shape for search results There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 💯 Not working in Figma because I'm visually illiterate but will try to fix those color/layout issues |
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
@@ -0,0 +1,69 @@ | ||||||
[[full-text-search]] | ||||||
== Full-text search | ||||||
|
||||||
.Hands-on introduction to full-text search | ||||||
[TIP] | ||||||
==== | ||||||
Would you prefer to jump straight into a hands-on tutorial? | ||||||
Refer to our quick start <<full-text-filter-tutorial,full-text search tutorial>>. | ||||||
==== | ||||||
|
||||||
Full-text search, also known as lexical search, is a technique for fast, efficient searching through text fields in documents. | ||||||
Documents and search queries are transformed to enable returning https://www.elastic.co/what-is/search-relevance[relevant] results instead of simply exact term matches. | ||||||
Fields of type <<text-field-type,`text`>> are analyzed and indexed for full-text search. | ||||||
|
||||||
Built on decades of information retrieval research, full-text search in {es} is a compute-efficient, deterministic approach that scales predictably with data volume. | ||||||
|
||||||
Full-text search is the cornerstone of production-grade search solutions. | ||||||
Combine full-text search with <<semantic-search,semantic search using vectors>> to build modern hybrid search applications. | ||||||
|
Combine full-text search with <<semantic-search,semantic search using vectors>> to build modern hybrid search applications. | |
You can combine full-text search with <<semantic-search,semantic search using vectors>> to build modern hybrid search applications. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was an ORDER!
leemthompo marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
leemthompo marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
leemthompo marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this section works as a v1 but it might be nice to guide people through what resources we want them to check out next, or help them to understand the context of a topic (e.g. "To learn how to optimize the relevance of your search results, refer to <<Search relevance optimizations>>
")
would also consider pulling out the "get started" into its own CTA - it's the most important thing people should be looking at next. I'm also curious to know if there's a resource we can provide to move this into a prod world (guess that would be explained in our references to API clients)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding some more context.
I hinted at prod world in the intro paragraph revision— to concretize the compute efficiency wording, with link to moving to prod section.
Uh oh!
There was an error while loading. Please reload this page.