Skip to content

Commit ed556db

Browse files
[Backport 8.19] ES|QL query builder integration with the DSL module (Fixes elastic#3056) (elastic#3058)
* [Backport 8.19] ES|QL query builder integration with the DSL module (elastic#3056) * ES|QL query builder integration with the DSL module (elastic#3048) * ES|QL query builder integration with the DSL module * esql DSL tests * more esql DSL tests * documentation * add esql+dsl example * review feedback (cherry picked from commit 228e66c) * python 3.8 typing fix --------- Co-authored-by: Miguel Grinberg <[email protected]> * added missing asciidoc file --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
1 parent 25f3273 commit ed556db

File tree

17 files changed

+1051
-49
lines changed

17 files changed

+1051
-49
lines changed

docs/guide/dsl/esql.asciidoc

Lines changed: 131 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,131 @@
1+
==== ES|QL Queries
2+
3+
When working with `Document` classes, you can use the ES|QL query language to retrieve documents. For this you can use the `esql_from()` and `esql_execute()` methods available to all sub-classes of `Document`.
4+
5+
Consider the following `Employee` document definition:
6+
7+
[source,python]
8+
----
9+
from elasticsearch.dsl import Document, InnerDoc, M
10+
11+
class Address(InnerDoc):
12+
address: M[str]
13+
city: M[str]
14+
zip_code: M[str]
15+
16+
class Employee(Document):
17+
emp_no: M[int]
18+
first_name: M[str]
19+
last_name: M[str]
20+
height: M[float]
21+
still_hired: M[bool]
22+
address: M[Address]
23+
24+
class Index:
25+
name = 'employees'
26+
----
27+
28+
The `esql_from()` method creates a base ES|QL query for the index associated with the document class. The following example creates a base query for the `Employee` class:
29+
30+
[source,python]
31+
----
32+
query = Employee.esql_from()
33+
----
34+
35+
This query includes a `FROM` command with the index name, and a `KEEP` command that retrieves all the document attributes.
36+
37+
To execute this query and receive the results, you can pass the query to the `esql_execute()` method:
38+
39+
[source,python]
40+
----
41+
for emp in Employee.esql_execute(query):
42+
print(f"{emp.name} from {emp.address.city} is {emp.height:.2f}m tall")
43+
----
44+
45+
In this example, the `esql_execute()` class method runs the query and returns all the documents in the index, up to the maximum of 1000 results allowed by ES|QL. Here is a possible output from this example:
46+
47+
[source,text]
48+
----
49+
Kevin Macias from North Robert is 1.60m tall
50+
Drew Harris from Boltonshire is 1.68m tall
51+
Julie Williams from Maddoxshire is 1.99m tall
52+
Christopher Jones from Stevenbury is 1.98m tall
53+
Anthony Lopez from Port Sarahtown is 2.42m tall
54+
Tricia Stone from North Sueshire is 2.39m tall
55+
Katherine Ramirez from Kimberlyton is 1.83m tall
56+
...
57+
----
58+
59+
To search for specific documents you can extend the base query with additional ES|QL commands that narrow the search criteria. The next example searches for documents that include only employees that are taller than 2 meters, sorted by their last name. It also limits the results to 4 people:
60+
61+
[source,python]
62+
----
63+
query = (
64+
Employee.esql_from()
65+
.where(Employee.height > 2)
66+
.sort(Employee.last_name)
67+
.limit(4)
68+
)
69+
----
70+
71+
When running this query with the same for-loop shown above, possible results would be:
72+
73+
[source,text]
74+
----
75+
Michael Adkins from North Stacey is 2.48m tall
76+
Kimberly Allen from Toddside is 2.24m tall
77+
Crystal Austin from East Michaelchester is 2.30m tall
78+
Rebecca Berger from Lake Adrianside is 2.40m tall
79+
----
80+
81+
===== Additional fields
82+
83+
ES|QL provides a few ways to add new fields to a query, for example through the `EVAL` command. The following example shows a query that adds an evaluated field:
84+
85+
[source,python]
86+
----
87+
from elasticsearch.esql import E, functions
88+
89+
query = (
90+
Employee.esql_from()
91+
.eval(height_cm=functions.round(Employee.height * 100))
92+
.where(E("height_cm") >= 200)
93+
.sort(Employee.last_name)
94+
.limit(10)
95+
)
96+
----
97+
98+
In this example we are adding the height in centimeters to the query, calculated from the `height` document field, which is in meters. The `height_cm` calculated field is available to use in other query clauses, and in particular is referenced in `where()` in this example. Note how the new field is given as `E("height_cm")` in this clause. The `E()` wrapper tells the query builder that the argument is an ES|QL field name and not a string literal. This is done automatically for document fields that are given as class attributes, such as `Employee.height` in the `eval()`. The `E()` wrapper is only needed for fields that are not in the document.
99+
100+
By default, the `esql_execute()` method returns only document instances. To receive any additional fields that are not part of the document in the query results, the `return_additional=True` argument can be passed to it, and then the results are returned as tuples with the document as first element, and a dictionary with the additional fields as second element:
101+
102+
[source,python]
103+
----
104+
for emp, additional in Employee.esql_execute(query, return_additional=True):
105+
print(emp.name, additional)
106+
----
107+
108+
Example output from the query given above:
109+
110+
[source,text]
111+
----
112+
Michael Adkins {'height_cm': 248.0}
113+
Kimberly Allen {'height_cm': 224.0}
114+
Crystal Austin {'height_cm': 230.0}
115+
Rebecca Berger {'height_cm': 240.0}
116+
Katherine Blake {'height_cm': 214.0}
117+
Edward Butler {'height_cm': 246.0}
118+
Steven Carlson {'height_cm': 242.0}
119+
Mark Carter {'height_cm': 240.0}
120+
Joseph Castillo {'height_cm': 229.0}
121+
Alexander Cohen {'height_cm': 245.0}
122+
----
123+
124+
===== Missing fields
125+
126+
The base query returned by the `esql_from()` method includes a `KEEP` command with the complete list of fields that are part of the document. If any subsequent clauses added to the query remove fields that are part of the document, then the `esql_execute()` method will raise an exception, because it will not be able construct complete document instances to return as results.
127+
128+
To prevent errors, it is recommended that the `keep()` and `drop()` clauses are not used when working with `Document` instances.
129+
130+
If a query has missing fields, it can be forced to execute without errors by passing the `ignore_missing_fields=True` argument to `esql_execute()`. When this option is used, returned documents will have any missing fields set to `None`.
131+

docs/guide/dsl/howto.asciidoc

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,4 +4,5 @@ include::search_dsl.asciidoc[]
44
include::persistence.asciidoc[]
55
include::faceted_search.asciidoc[]
66
include::update_by_query.asciidoc[]
7+
include::esql.asciidoc[]
78
include::asyncio.asciidoc[]

docs/guide/dsl/tutorials.asciidoc

Lines changed: 24 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -83,17 +83,17 @@ system:
8383
[source,python]
8484
----
8585
from datetime import datetime
86-
from elasticsearch.dsl import Document, Date, Integer, Keyword, Text, connections
86+
from elasticsearch.dsl import Document, Date, Integer, Keyword, Text, connections, mapped_field
8787
8888
# Define a default Elasticsearch client
8989
connections.create_connection(hosts="https://localhost:9200")
9090
9191
class Article(Document):
92-
title = Text(analyzer='snowball', fields={'raw': Keyword()})
93-
body = Text(analyzer='snowball')
94-
tags = Keyword()
95-
published_from = Date()
96-
lines = Integer()
92+
title: str = mapped_field(Text(analyzer='snowball', fields={'raw': Keyword()}))
93+
body: str = mapped_field(Text(analyzer='snowball'))
94+
tags: list[str] = mapped_field(Keyword())
95+
published_from: datetime
96+
lines: int
9797
9898
class Index:
9999
name = 'blog'
@@ -229,13 +229,31 @@ savings offered by the `Search` object, and additionally allows one to
229229
update the results of the search based on a script assigned in the same
230230
manner.
231231

232+
==== ES|QL Queries
233+
234+
The DSL module features an integration with the ES|QL query builder, consisting of two methods available in all `Document` sub-classes: `esql_from()` and `esql_execute()`. Using the `Article` document from above, we can search for up to ten articles that include `"world"` in their titles with the following ES|QL query:
235+
236+
[source,python]
237+
----
238+
from elasticsearch.esql import functions
239+
240+
query = Article.esql_from().where(functions.match(Article.title, 'world')).limit(10)
241+
for a in Article.esql_execute(query):
242+
print(a.title)
243+
----
244+
245+
Review the ES|QL Query Builder section to learn more about building ES|QL queries in Python.
246+
232247
==== Migration from the standard client
233248

249+
<<<<<<< HEAD:docs/guide/dsl/tutorials.asciidoc
234250
You don't have to port your entire application to get the benefits of
235251
the DSL module, you can start gradually by creating a `Search` object
236252
from your existing `dict`, modifying it using the API and serializing it
237253
back to a `dict`:
238254

255+
==== Migration from the standard client
256+
239257
[source,python]
240258
----
241259
body = {...} # insert complicated query here

docs/guide/esql-query-builder.asciidoc

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -21,21 +21,21 @@ You can then see the assembled ES|QL query by printing the resulting query objec
2121

2222
[source, python]
2323
----------------------------
24-
>>> query
24+
>>> print(query)
2525
FROM employees
2626
| SORT emp_no
2727
| KEEP first_name, last_name, height
2828
| EVAL height_feet = height * 3.281, height_cm = height * 100
2929
| LIMIT 3
3030
----------------------------
3131

32-
To execute this query, you can cast it to a string and pass the string to the `client.esql.query()` endpoint:
32+
To execute this query, you can pass it to the `client.esql.query()` endpoint:
3333

3434
[source, python]
3535
----------------------------
3636
>>> from elasticsearch import Elasticsearch
3737
>>> client = Elasticsearch(hosts=[os.environ['ELASTICSEARCH_URL']])
38-
>>> response = client.esql.query(query=str(query))
38+
>>> response = client.esql.query(query=query)
3939
----------------------------
4040

4141
The response body contains a `columns` attribute with the list of columns included in the results, and a `values` attribute with the list of results for the query, each given as a list of column values. Here is a possible response body returned by the example query given above:
@@ -228,7 +228,7 @@ def find_employee_by_name(name):
228228
.keep("first_name", "last_name", "height")
229229
.where(E("first_name") == E("?"))
230230
)
231-
return client.esql.query(query=str(query), params=[name])
231+
return client.esql.query(query=query, params=[name])
232232
----------------------------
233233

234234
Here the part of the query in which the untrusted data needs to be inserted is replaced with a parameter, which in ES|QL is defined by the question mark. When using Python expressions, the parameter must be given as `E("?")` so that it is treated as an expression and not as a literal string.

elasticsearch/dsl/_async/document.py

Lines changed: 84 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,7 @@
2020
TYPE_CHECKING,
2121
Any,
2222
AsyncIterable,
23+
AsyncIterator,
2324
Dict,
2425
List,
2526
Optional,
@@ -42,6 +43,7 @@
4243

4344
if TYPE_CHECKING:
4445
from elasticsearch import AsyncElasticsearch
46+
from elasticsearch.esql.esql import ESQLBase
4547

4648

4749
class AsyncIndexMeta(DocumentMeta):
@@ -520,3 +522,85 @@ async def __anext__(self) -> Dict[str, Any]:
520522
return action
521523

522524
return await async_bulk(es, Generate(actions), **kwargs)
525+
526+
@classmethod
527+
async def esql_execute(
528+
cls,
529+
query: "ESQLBase",
530+
return_additional: bool = False,
531+
ignore_missing_fields: bool = False,
532+
using: Optional[AsyncUsingType] = None,
533+
**kwargs: Any,
534+
) -> AsyncIterator[Union[Self, Tuple[Self, Dict[str, Any]]]]:
535+
"""
536+
Execute the given ES|QL query and return an iterator of 2-element tuples,
537+
where the first element is an instance of this ``Document`` and the
538+
second a dictionary with any remaining columns requested in the query.
539+
540+
:arg query: an ES|QL query object created with the ``esql_from()`` method.
541+
:arg return_additional: if ``False`` (the default), this method returns
542+
document objects. If set to ``True``, the method returns tuples with
543+
a document in the first element and a dictionary with any additional
544+
columns returned by the query in the second element.
545+
:arg ignore_missing_fields: if ``False`` (the default), all the fields of
546+
the document must be present in the query, or else an exception is
547+
raised. Set to ``True`` to allow missing fields, which will result in
548+
partially initialized document objects.
549+
:arg using: connection alias to use, defaults to ``'default'``
550+
:arg kwargs: additional options for the ``client.esql.query()`` function.
551+
"""
552+
es = cls._get_connection(using)
553+
response = await es.esql.query(query=str(query), **kwargs)
554+
query_columns = [col["name"] for col in response.body.get("columns", [])]
555+
556+
# Here we get the list of columns defined in the document, which are the
557+
# columns that we will take from each result to assemble the document
558+
# object.
559+
# When `for_esql=False` is passed below by default, the list will include
560+
# nested fields, which ES|QL does not return, causing an error. When passing
561+
# `ignore_missing_fields=True` the list will be generated with
562+
# `for_esql=True`, so the error will not occur, but the documents will
563+
# not have any Nested objects in them.
564+
doc_fields = set(cls._get_field_names(for_esql=ignore_missing_fields))
565+
if not ignore_missing_fields and not doc_fields.issubset(set(query_columns)):
566+
raise ValueError(
567+
f"Not all fields of {cls.__name__} were returned by the query. "
568+
"Make sure your document does not use Nested fields, which are "
569+
"currently not supported in ES|QL. To force the query to be "
570+
"evaluated in spite of the missing fields, pass set the "
571+
"ignore_missing_fields=True option in the esql_execute() call."
572+
)
573+
non_doc_fields: set[str] = set(query_columns) - doc_fields - {"_id"}
574+
index_id = query_columns.index("_id")
575+
576+
results = response.body.get("values", [])
577+
for column_values in results:
578+
# create a dictionary with all the document fields, expanding the
579+
# dot notation returned by ES|QL into the recursive dictionaries
580+
# used by Document.from_dict()
581+
doc_dict: Dict[str, Any] = {}
582+
for col, val in zip(query_columns, column_values):
583+
if col in doc_fields:
584+
cols = col.split(".")
585+
d = doc_dict
586+
for c in cols[:-1]:
587+
if c not in d:
588+
d[c] = {}
589+
d = d[c]
590+
d[cols[-1]] = val
591+
592+
# create the document instance
593+
obj = cls(meta={"_id": column_values[index_id]})
594+
obj._from_dict(doc_dict)
595+
596+
if return_additional:
597+
# build a dict with any other values included in the response
598+
other = {
599+
col: val
600+
for col, val in zip(query_columns, column_values)
601+
if col in non_doc_fields
602+
}
603+
604+
yield obj, other
605+
else:
606+
yield obj

0 commit comments

Comments
 (0)