Skip to content

Commit 768cbe7

Browse files
committed
wip, move to standalone file
1 parent 0633e3f commit 768cbe7

File tree

3 files changed

+666
-245
lines changed

3 files changed

+666
-245
lines changed
Lines changed: 393 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,393 @@
1+
---
2+
navigation_title: "Search and filter with ES|QL"
3+
---
4+
5+
# Tutorial: Full-text search and filtering with {{esql}}
6+
7+
:::{tip}
8+
This tutorial presents examples in {{esql}} syntax. Refer to [the Query DSL version](query-dsl-full-text-filter-tutorial.md) for the equivalent examples in Query DSL syntax.
9+
:::
10+
11+
This is a hands-on introduction to the basics of [full-text search](full-text.md) with Elasticsearch, also known as *lexical search*, and how to filter search results based on exact criteria. In this scenario, we're implementing a search function for a cooking blog. The blog contains recipes with various attributes including textual content, categorical data, and numerical ratings.
12+
13+
## Requirements
14+
15+
You'll need a running {{es}} cluster, together with {{kib}} to use the Dev Tools API Console. Refer to [choose your deployment type](/deploy-manage/deploy#choosing-your-deployment-type) for deployment options.
16+
17+
Want to get started quickly? Run the following command in your terminal to set up a [single-node local cluster in Docker](get-started.md):
18+
19+
```sh
20+
curl -fsSL https://elastic.co/start-local | sh
21+
```
22+
23+
## Step 1: Create an index
24+
25+
Create the `cooking_blog` index to get started:
26+
27+
```console
28+
PUT /cooking_blog
29+
```
30+
31+
Now define the mappings for the index:
32+
33+
```console
34+
PUT /cooking_blog/_mapping
35+
{
36+
"properties": {
37+
"title": {
38+
"type": "text",
39+
"analyzer": "standard", <1>
40+
"fields": { <2>
41+
"keyword": {
42+
"type": "keyword",
43+
"ignore_above": 256 <3>
44+
}
45+
}
46+
},
47+
"description": {
48+
"type": "text",
49+
"fields": {
50+
"keyword": {
51+
"type": "keyword"
52+
}
53+
}
54+
},
55+
"author": {
56+
"type": "text",
57+
"fields": {
58+
"keyword": {
59+
"type": "keyword"
60+
}
61+
}
62+
},
63+
"date": {
64+
"type": "date",
65+
"format": "yyyy-MM-dd"
66+
},
67+
"category": {
68+
"type": "text",
69+
"fields": {
70+
"keyword": {
71+
"type": "keyword"
72+
}
73+
}
74+
},
75+
"tags": {
76+
"type": "text",
77+
"fields": {
78+
"keyword": {
79+
"type": "keyword"
80+
}
81+
}
82+
},
83+
"rating": {
84+
"type": "float"
85+
}
86+
}
87+
}
88+
```
89+
90+
1. The `standard` analyzer is used by default for `text` fields if an `analyzer` isn't specified. It's included here for demonstration purposes.
91+
2. [Multi-fields](elasticsearch://reference/elasticsearch/mapping-reference/multi-fields.md) are used here to index `text` fields as both `text` and `keyword` [data types](elasticsearch://reference/elasticsearch/mapping-reference/field-data-types.md). This enables both full-text search and exact matching/filtering on the same field. Note that if you used [dynamic mapping](../../manage-data/data-store/mapping/dynamic-field-mapping.md), these multi-fields would be created automatically.
92+
3. The [`ignore_above` parameter](elasticsearch://reference/elasticsearch/mapping-reference/ignore-above.md) prevents indexing values longer than 256 characters in the `keyword` field. Again this is the default value, but it's included here for demonstration purposes. It helps to save disk space and avoid potential issues with Lucene's term byte-length limit.
93+
94+
::::{tip}
95+
Full-text search is powered by [text analysis](full-text/text-analysis-during-search.md). Text analysis normalizes and standardizes text data so it can be efficiently stored in an inverted index and searched in near real-time. Analysis happens at both [index and search time](../../manage-data/data-store/text-analysis/index-search-analysis.md). This tutorial won't cover analysis in detail, but it's important to understand how text is processed to create effective search queries.
96+
::::
97+
98+
## Step 2: Perform basic full-text searches
99+
100+
Full-text search involves executing text-based queries across one or more document fields. These queries calculate a relevance score for each matching document, based on how closely the document's content aligns with the search terms. Elasticsearch offers various query types, each with its own method for matching text and relevance scoring.
101+
102+
:::{tip}
103+
ES|QL provides two ways to perform full-text searches:
104+
105+
1. Full match function syntax: `match(field, "search terms")`
106+
1. Compact syntax using the colon operator: `field:"search terms"`
107+
108+
Both are equivalent and can be used interchangeably. The compact syntax is more concise, while the function syntax allows for more configuration options. We'll use the compact syntax in most examples for brevity.
109+
:::
110+
111+
### Basic full-text query
112+
113+
Here's how to search the `description` field for "fluffy pancakes":
114+
115+
```esql
116+
POST /_query?format=txt
117+
{
118+
"query": """
119+
FROM cooking_blog
120+
| WHERE description:"fluffy pancakes"
121+
| LIMIT 1000
122+
"""
123+
}
124+
```
125+
126+
By default, like the Query DSL `match` query, ES|QL uses `OR` logic between terms. This means it will match documents that contain either "fluffy" or "pancakes", or both, in the description field.
127+
128+
:::{tip}
129+
You can control which fields to include in the response using the `KEEP` command:
130+
131+
```esql
132+
POST /_query?format=txt
133+
{
134+
"query": """
135+
FROM cooking_blog
136+
| WHERE description:"fluffy pancakes"
137+
| KEEP title, description, rating
138+
| LIMIT 1000
139+
"""
140+
}
141+
```
142+
:::
143+
144+
### Require all terms in a match query
145+
146+
Sometimes you need to require that all search terms appear in the matching documents. Here's how to do that using the function syntax with the `operator` parameter:
147+
148+
```esql
149+
POST /_query?format=txt
150+
{
151+
"query": """
152+
FROM cooking_blog
153+
| WHERE match(description, "fluffy pancakes", {"operator": "AND"})
154+
| LIMIT 1000
155+
"""
156+
}
157+
```
158+
159+
This stricter search returns *zero hits* on our sample data, as no document contains both "fluffy" and "pancakes" in the description.
160+
161+
### Specify a minimum number of terms to match
162+
163+
Sometimes requiring all terms is too strict, but the default OR behavior is too lenient. You can specify a minimum number of terms that must match:
164+
165+
```esql
166+
POST /_query?format=txt
167+
{
168+
"query": """
169+
FROM cooking_blog
170+
| WHERE match(title, "fluffy pancakes breakfast", {"minimum_should_match": 2})
171+
| LIMIT 1000
172+
"""
173+
}
174+
```
175+
176+
This query searches the title field to match at least 2 of the 3 terms: "fluffy", "pancakes", or "breakfast".
177+
178+
## Step 3: Search across multiple fields at once
179+
180+
When users enter a search query, they often don't know (or care) whether their search terms appear in a specific field. ES|QL provides ways to search across multiple fields simultaneously:
181+
182+
```esql
183+
POST /_query?format=txt
184+
{
185+
"query": """
186+
FROM cooking_blog
187+
| WHERE title:"vegetarian curry" OR description:"vegetarian curry" OR tags:"vegetarian curry"
188+
| LIMIT 1000
189+
"""
190+
}
191+
```
192+
193+
This query searches for "vegetarian curry" across the title, description, and tags fields. Each field is treated with equal importance.
194+
195+
However, in many cases, matches in certain fields (like the title) might be more relevant than others. We can adjust the importance of each field using scoring:
196+
197+
```esql
198+
POST /_query?format=txt
199+
{
200+
"query": """
201+
FROM cooking_blog METADATA _score
202+
| WHERE match(title, "vegetarian curry", {"boost": 2.0})
203+
OR match(description, "vegetarian curry")
204+
OR match(tags, "vegetarian curry")
205+
| KEEP title, description, tags, _score
206+
| SORT _score DESC
207+
| LIMIT 1000
208+
"""
209+
}
210+
```
211+
212+
In this example, we're using the `boost` parameter to make matches in the title field twice as important as matches in other fields. We also request the `_score` metadata field to sort results by relevance.
213+
214+
## Step 4: Filter and find exact matches
215+
216+
Filtering allows you to narrow down your search results based on exact criteria. Unlike full-text searches, filters are binary (yes/no) and do not affect the relevance score. Filters execute faster than queries because excluded results don't need to be scored.
217+
218+
```esql
219+
POST /_query?format=txt
220+
{
221+
"query": """
222+
FROM cooking_blog
223+
| WHERE category.keyword == "Breakfast"
224+
| KEEP title, author, rating, tags
225+
| SORT rating DESC
226+
| LIMIT 1000
227+
"""
228+
}
229+
```
230+
231+
Note the use of `category.keyword` here. This refers to the [`keyword`](elasticsearch://reference/elasticsearch/mapping-reference/keyword.md) multi-field of the `category` field, ensuring an exact, case-sensitive match.
232+
233+
### Search for posts within a date range
234+
235+
Often users want to find content published within a specific time frame:
236+
237+
```esql
238+
POST /_query?format=txt
239+
{
240+
"query": """
241+
FROM cooking_blog
242+
| WHERE date >= "2023-05-01" AND date <= "2023-05-31"
243+
| KEEP title, author, date, rating
244+
| LIMIT 1000
245+
"""
246+
}
247+
```
248+
249+
### Find exact matches
250+
251+
Sometimes users want to search for exact terms to eliminate ambiguity in their search results:
252+
253+
```esql
254+
POST /_query?format=txt
255+
{
256+
"query": """
257+
FROM cooking_blog
258+
| WHERE tags.keyword == "vegetarian"
259+
| KEEP title, author, rating, tags
260+
| LIMIT 1000
261+
"""
262+
}
263+
```
264+
265+
Like the `term` query in Query DSL, this has zero flexibility and is case-sensitive.
266+
267+
## Step 5: Combine multiple search criteria
268+
269+
Complex searches often require combining multiple search criteria:
270+
271+
```esql
272+
POST /_query?format=txt
273+
{
274+
"query": """
275+
FROM cooking_blog METADATA _score
276+
| WHERE rating >= 4.5
277+
AND NOT category.keyword == "Dessert"
278+
AND (title:"curry spicy" OR description:"curry spicy")
279+
| SORT _score DESC
280+
| KEEP title, author, rating, tags, description
281+
| LIMIT 1000
282+
"""
283+
}
284+
```
285+
286+
For more complex relevance scoring with combined criteria, you can use the `EVAL` command to calculate custom scores:
287+
288+
```esql
289+
POST /_query?format=txt
290+
{
291+
"query": """
292+
FROM cooking_blog METADATA _score
293+
| WHERE tags.keyword == "vegetarian" AND rating >= 4.5
294+
| EVAL title_score = SCORE(match(title, "curry spicy")) * 2
295+
| EVAL desc_score = SCORE(match(description, "curry spicy"))
296+
| EVAL combined_score = title_score + desc_score
297+
| EVAL category_boost = IF(category.keyword == "Main Course", 1.0, 0.0)
298+
| EVAL date_boost = IF(date >= "now-1M/d", 0.5, 0.0)
299+
| EVAL final_score = combined_score + category_boost + date_boost
300+
| WHERE NOT category.keyword == "Dessert"
301+
| WHERE final_score > 0
302+
| SORT final_score DESC
303+
| LIMIT 1000
304+
"""
305+
}
306+
```
307+
308+
This ES|QL query uses an explicit scoring mechanism:
309+
1. Requires "vegetarian" tag and rating >= 4.5
310+
2. Computes separate scores for `title` and `description` matches
311+
3. Adds boosts for Main Course category and recent dates
312+
4. Excludes Desserts
313+
5. Sorts by the final combined score
314+
315+
316+
:::{warning}
317+
TODO
318+
319+
This section shouldn't live in a tutorial, leaving it here for comments/suggestions if it might be useful
320+
:::
321+
322+
## Optimizing your ES|QL queries
323+
324+
ES|QL queries can be optimized for better performance and more relevant results. Here are some key optimization strategies:
325+
326+
### Field filtering with KEEP
327+
328+
Using `KEEP` early in your query pipeline can significantly improve performance by reducing the fields that need to be fetched:
329+
330+
```esql
331+
POST /_query?format=txt
332+
{
333+
"query": """
334+
FROM cooking_blog
335+
| KEEP title, description, rating
336+
| WHERE title:"curry"
337+
| LIMIT 1000
338+
"""
339+
}
340+
```
341+
342+
However, there's an important caveat: if you need to filter on fields not included in `KEEP`, you should place your `WHERE` clauses before `KEEP`:
343+
344+
```esql
345+
POST /_query?format=txt
346+
{
347+
"query": """
348+
FROM cooking_blog
349+
| WHERE category.keyword == "Main Course" AND rating >= 4.0
350+
| KEEP title, description, rating
351+
| LIMIT 1000
352+
"""
353+
}
354+
```
355+
356+
Placing `WHERE` before `KEEP` allows ES|QL to optimize field caps, only requesting the fields needed for filtering and display.
357+
358+
### Optimal query order
359+
360+
For best performance, structure your ES|QL queries in this general order:
361+
362+
1. `FROM` to select your index
363+
2. `WHERE` clauses for filtering
364+
3. `KEEP` to select only needed fields
365+
4. Processing operations (`EVAL`, aggregations, etc.)
366+
5. `SORT` to order results
367+
6. `LIMIT` to restrict result count
368+
369+
This order allows Elasticsearch to apply filters early, reducing the dataset before performing more expensive operations.
370+
371+
### Use keyword fields for exact matching
372+
373+
Always use the `.keyword` suffix for exact matching on text fields. This improves performance and ensures case-sensitive, exact matches:
374+
375+
```esql
376+
POST /_query?format=txt
377+
{
378+
"query": """
379+
FROM cooking_blog
380+
| WHERE tags.keyword == "vegetarian"
381+
| LIMIT 1000
382+
"""
383+
}
384+
```
385+
386+
## Learn more
387+
388+
This tutorial introduced the basics of full-text search and filtering in ES|QL. Building a real-world search experience requires understanding many more advanced concepts and techniques. Here are some resources once you're ready to dive deeper:
389+
390+
- [Full-text search](full-text.md): Learn about the core components of full-text search in Elasticsearch.
391+
- [Text analysis](full-text/text-analysis-during-search.md): Understand how text is processed for full-text search.
392+
- [Query and filter data](/explore-analyze/query-filter.md): Understand all your options for searching and analyzing data in {{es}} in the Explore & Analyze section.
393+
- [Search your data](../search.md): Learn about more advanced search techniques including semantic search.

0 commit comments

Comments
 (0)