Skip to content

Commit 2aa0459

Browse files
ES|QL query builder integration with the DSL module (#3048) (#3057)
* ES|QL query builder integration with the DSL module * esql DSL tests * more esql DSL tests * documentation * add esql+dsl example * review feedback (cherry picked from commit 228e66c) Co-authored-by: Miguel Grinberg <[email protected]>
1 parent a562194 commit 2aa0459

File tree

16 files changed

+1031
-45
lines changed

16 files changed

+1031
-45
lines changed

docs/reference/dsl_how_to_guides.md

Lines changed: 121 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1425,6 +1425,127 @@ print(response.took)
14251425
If you want to inspect the contents of the `response` objects, just use its `to_dict` method to get access to the raw data for pretty printing.
14261426

14271427

1428+
## ES|QL Queries
1429+
1430+
When working with `Document` classes, you can use the ES|QL query language to retrieve documents. For this you can use the `esql_from()` and `esql_execute()` methods available to all sub-classes of `Document`.
1431+
1432+
Consider the following `Employee` document definition:
1433+
1434+
```python
1435+
from elasticsearch.dsl import Document, InnerDoc, M
1436+
1437+
class Address(InnerDoc):
1438+
address: M[str]
1439+
city: M[str]
1440+
zip_code: M[str]
1441+
1442+
class Employee(Document):
1443+
emp_no: M[int]
1444+
first_name: M[str]
1445+
last_name: M[str]
1446+
height: M[float]
1447+
still_hired: M[bool]
1448+
address: M[Address]
1449+
1450+
class Index:
1451+
name = 'employees'
1452+
```
1453+
1454+
The `esql_from()` method creates a base ES|QL query for the index associated with the document class. The following example creates a base query for the `Employee` class:
1455+
1456+
```python
1457+
query = Employee.esql_from()
1458+
```
1459+
1460+
This query includes a `FROM` command with the index name, and a `KEEP` command that retrieves all the document attributes.
1461+
1462+
To execute this query and receive the results, you can pass the query to the `esql_execute()` method:
1463+
1464+
```python
1465+
for emp in Employee.esql_execute(query):
1466+
print(f"{emp.name} from {emp.address.city} is {emp.height:.2f}m tall")
1467+
```
1468+
1469+
In this example, the `esql_execute()` class method runs the query and returns all the documents in the index, up to the maximum of 1000 results allowed by ES|QL. Here is a possible output from this example:
1470+
1471+
```
1472+
Kevin Macias from North Robert is 1.60m tall
1473+
Drew Harris from Boltonshire is 1.68m tall
1474+
Julie Williams from Maddoxshire is 1.99m tall
1475+
Christopher Jones from Stevenbury is 1.98m tall
1476+
Anthony Lopez from Port Sarahtown is 2.42m tall
1477+
Tricia Stone from North Sueshire is 2.39m tall
1478+
Katherine Ramirez from Kimberlyton is 1.83m tall
1479+
...
1480+
```
1481+
1482+
To search for specific documents you can extend the base query with additional ES|QL commands that narrow the search criteria. The next example searches for documents that include only employees that are taller than 2 meters, sorted by their last name. It also limits the results to 4 people:
1483+
1484+
```python
1485+
query = (
1486+
Employee.esql_from()
1487+
.where(Employee.height > 2)
1488+
.sort(Employee.last_name)
1489+
.limit(4)
1490+
)
1491+
```
1492+
1493+
When running this query with the same for-loop shown above, possible results would be:
1494+
1495+
```
1496+
Michael Adkins from North Stacey is 2.48m tall
1497+
Kimberly Allen from Toddside is 2.24m tall
1498+
Crystal Austin from East Michaelchester is 2.30m tall
1499+
Rebecca Berger from Lake Adrianside is 2.40m tall
1500+
```
1501+
1502+
### Additional fields
1503+
1504+
ES|QL provides a few ways to add new fields to a query, for example through the `EVAL` command. The following example shows a query that adds an evaluated field:
1505+
1506+
```python
1507+
from elasticsearch.esql import E, functions
1508+
1509+
query = (
1510+
Employee.esql_from()
1511+
.eval(height_cm=functions.round(Employee.height * 100))
1512+
.where(E("height_cm") >= 200)
1513+
.sort(Employee.last_name)
1514+
.limit(10)
1515+
)
1516+
```
1517+
1518+
In this example we are adding the height in centimeters to the query, calculated from the `height` document field, which is in meters. The `height_cm` calculated field is available to use in other query clauses, and in particular is referenced in `where()` in this example. Note how the new field is given as `E("height_cm")` in this clause. The `E()` wrapper tells the query builder that the argument is an ES|QL field name and not a string literal. This is done automatically for document fields that are given as class attributes, such as `Employee.height` in the `eval()`. The `E()` wrapper is only needed for fields that are not in the document.
1519+
1520+
By default, the `esql_execute()` method returns only document instances. To receive any additional fields that are not part of the document in the query results, the `return_additional=True` argument can be passed to it, and then the results are returned as tuples with the document as first element, and a dictionary with the additional fields as second element:
1521+
1522+
```python
1523+
for emp, additional in Employee.esql_execute(query, return_additional=True):
1524+
print(emp.name, additional)
1525+
```
1526+
1527+
Example output from the query given above:
1528+
1529+
```
1530+
Michael Adkins {'height_cm': 248.0}
1531+
Kimberly Allen {'height_cm': 224.0}
1532+
Crystal Austin {'height_cm': 230.0}
1533+
Rebecca Berger {'height_cm': 240.0}
1534+
Katherine Blake {'height_cm': 214.0}
1535+
Edward Butler {'height_cm': 246.0}
1536+
Steven Carlson {'height_cm': 242.0}
1537+
Mark Carter {'height_cm': 240.0}
1538+
Joseph Castillo {'height_cm': 229.0}
1539+
Alexander Cohen {'height_cm': 245.0}
1540+
```
1541+
1542+
### Missing fields
1543+
1544+
The base query returned by the `esql_from()` method includes a `KEEP` command with the complete list of fields that are part of the document. If any subsequent clauses added to the query remove fields that are part of the document, then the `esql_execute()` method will raise an exception, because it will not be able construct complete document instances to return as results.
1545+
1546+
To prevent errors, it is recommended that the `keep()` and `drop()` clauses are not used when working with `Document` instances.
1547+
1548+
If a query has missing fields, it can be forced to execute without errors by passing the `ignore_missing_fields=True` argument to `esql_execute()`. When this option is used, returned documents will have any missing fields set to `None`.
14281549

14291550
## Using asyncio with Elasticsearch Python DSL [asyncio]
14301551

docs/reference/dsl_tutorials.md

Lines changed: 16 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -83,15 +83,15 @@ Let’s have a simple Python class representing an article in a blogging system:
8383

8484
```python
8585
from datetime import datetime
86-
from elasticsearch.dsl import Document, Date, Integer, Keyword, Text, connections
86+
from elasticsearch.dsl import Document, Date, Integer, Keyword, Text, connections, mapped_field
8787

8888
# Define a default Elasticsearch client
8989
connections.create_connection(hosts="https://localhost:9200")
9090

9191
class Article(Document):
9292
title: str = mapped_field(Text(analyzer='snowball', fields={'raw': Keyword()}))
9393
body: str = mapped_field(Text(analyzer='snowball'))
94-
tags: str = mapped_field(Keyword())
94+
tags: list[str] = mapped_field(Keyword())
9595
published_from: datetime
9696
lines: int
9797

@@ -216,6 +216,20 @@ response = ubq.execute()
216216
As you can see, the `Update By Query` object provides many of the savings offered by the `Search` object, and additionally allows one to update the results of the search based on a script assigned in the same manner.
217217

218218

219+
## ES|QL Queries
220+
221+
The DSL module features an integration with the ES|QL query builder, consisting of two methods available in all `Document` sub-classes: `esql_from()` and `esql_execute()`. Using the `Article` document from above, we can search for up to ten articles that include `"world"` in their titles with the following ES|QL query:
222+
223+
```python
224+
from elasticsearch.esql import functions
225+
226+
query = Article.esql_from().where(functions.match(Article.title, 'world')).limit(10)
227+
for a in Article.esql_execute(query):
228+
print(a.title)
229+
```
230+
231+
Review the [ES|QL Query Builder section](esql-query-builder.md) to learn more about building ES|QL queries in Python.
232+
219233
## Migration from the standard client [_migration_from_the_standard_client]
220234

221235
You don’t have to port your entire application to get the benefits of the DSL module, you can start gradually by creating a `Search` object from your existing `dict`, modifying it using the API and serializing it back to a `dict`:

docs/reference/esql-query-builder.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -20,20 +20,20 @@ The ES|QL Query Builder allows you to construct ES|QL queries using Python synta
2020
You can then see the assembled ES|QL query by printing the resulting query object:
2121

2222
```python
23-
>>> query
23+
>>> print(query)
2424
FROM employees
2525
| SORT emp_no
2626
| KEEP first_name, last_name, height
2727
| EVAL height_feet = height * 3.281, height_cm = height * 100
2828
| LIMIT 3
2929
```
3030

31-
To execute this query, you can cast it to a string and pass the string to the `client.esql.query()` endpoint:
31+
To execute this query, you can pass it to the `client.esql.query()` endpoint:
3232

3333
```python
3434
>>> from elasticsearch import Elasticsearch
3535
>>> client = Elasticsearch(hosts=[os.environ['ELASTICSEARCH_URL']])
36-
>>> response = client.esql.query(query=str(query))
36+
>>> response = client.esql.query(query=query)
3737
```
3838

3939
The response body contains a `columns` attribute with the list of columns included in the results, and a `values` attribute with the list of results for the query, each given as a list of column values. Here is a possible response body returned by the example query given above:
@@ -216,7 +216,7 @@ def find_employee_by_name(name):
216216
.keep("first_name", "last_name", "height")
217217
.where(E("first_name") == E("?"))
218218
)
219-
return client.esql.query(query=str(query), params=[name])
219+
return client.esql.query(query=query, params=[name])
220220
```
221221

222222
Here the part of the query in which the untrusted data needs to be inserted is replaced with a parameter, which in ES|QL is defined by the question mark. When using Python expressions, the parameter must be given as `E("?")` so that it is treated as an expression and not as a literal string.

elasticsearch/dsl/_async/document.py

Lines changed: 84 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,7 @@
2020
TYPE_CHECKING,
2121
Any,
2222
AsyncIterable,
23+
AsyncIterator,
2324
Dict,
2425
List,
2526
Optional,
@@ -42,6 +43,7 @@
4243

4344
if TYPE_CHECKING:
4445
from elasticsearch import AsyncElasticsearch
46+
from elasticsearch.esql.esql import ESQLBase
4547

4648

4749
class AsyncIndexMeta(DocumentMeta):
@@ -520,3 +522,85 @@ async def __anext__(self) -> Dict[str, Any]:
520522
return action
521523

522524
return await async_bulk(es, Generate(actions), **kwargs)
525+
526+
@classmethod
527+
async def esql_execute(
528+
cls,
529+
query: "ESQLBase",
530+
return_additional: bool = False,
531+
ignore_missing_fields: bool = False,
532+
using: Optional[AsyncUsingType] = None,
533+
**kwargs: Any,
534+
) -> AsyncIterator[Union[Self, Tuple[Self, Dict[str, Any]]]]:
535+
"""
536+
Execute the given ES|QL query and return an iterator of 2-element tuples,
537+
where the first element is an instance of this ``Document`` and the
538+
second a dictionary with any remaining columns requested in the query.
539+
540+
:arg query: an ES|QL query object created with the ``esql_from()`` method.
541+
:arg return_additional: if ``False`` (the default), this method returns
542+
document objects. If set to ``True``, the method returns tuples with
543+
a document in the first element and a dictionary with any additional
544+
columns returned by the query in the second element.
545+
:arg ignore_missing_fields: if ``False`` (the default), all the fields of
546+
the document must be present in the query, or else an exception is
547+
raised. Set to ``True`` to allow missing fields, which will result in
548+
partially initialized document objects.
549+
:arg using: connection alias to use, defaults to ``'default'``
550+
:arg kwargs: additional options for the ``client.esql.query()`` function.
551+
"""
552+
es = cls._get_connection(using)
553+
response = await es.esql.query(query=str(query), **kwargs)
554+
query_columns = [col["name"] for col in response.body.get("columns", [])]
555+
556+
# Here we get the list of columns defined in the document, which are the
557+
# columns that we will take from each result to assemble the document
558+
# object.
559+
# When `for_esql=False` is passed below by default, the list will include
560+
# nested fields, which ES|QL does not return, causing an error. When passing
561+
# `ignore_missing_fields=True` the list will be generated with
562+
# `for_esql=True`, so the error will not occur, but the documents will
563+
# not have any Nested objects in them.
564+
doc_fields = set(cls._get_field_names(for_esql=ignore_missing_fields))
565+
if not ignore_missing_fields and not doc_fields.issubset(set(query_columns)):
566+
raise ValueError(
567+
f"Not all fields of {cls.__name__} were returned by the query. "
568+
"Make sure your document does not use Nested fields, which are "
569+
"currently not supported in ES|QL. To force the query to be "
570+
"evaluated in spite of the missing fields, pass set the "
571+
"ignore_missing_fields=True option in the esql_execute() call."
572+
)
573+
non_doc_fields: set[str] = set(query_columns) - doc_fields - {"_id"}
574+
index_id = query_columns.index("_id")
575+
576+
results = response.body.get("values", [])
577+
for column_values in results:
578+
# create a dictionary with all the document fields, expanding the
579+
# dot notation returned by ES|QL into the recursive dictionaries
580+
# used by Document.from_dict()
581+
doc_dict: Dict[str, Any] = {}
582+
for col, val in zip(query_columns, column_values):
583+
if col in doc_fields:
584+
cols = col.split(".")
585+
d = doc_dict
586+
for c in cols[:-1]:
587+
if c not in d:
588+
d[c] = {}
589+
d = d[c]
590+
d[cols[-1]] = val
591+
592+
# create the document instance
593+
obj = cls(meta={"_id": column_values[index_id]})
594+
obj._from_dict(doc_dict)
595+
596+
if return_additional:
597+
# build a dict with any other values included in the response
598+
other = {
599+
col: val
600+
for col, val in zip(query_columns, column_values)
601+
if col in non_doc_fields
602+
}
603+
604+
yield obj, other
605+
else:
606+
yield obj

0 commit comments

Comments
 (0)