Skip to content

Commit 472536c

Browse files
lookup join docs (#124531)
* lookup join docs --------- Co-authored-by: Alexander Spies <[email protected]>
1 parent bf53f97 commit 472536c

File tree

7 files changed

+223
-1
lines changed

7 files changed

+223
-1
lines changed

docs/images/esql-lookup-join.png

12.7 KB
Loading

docs/reference/elasticsearch/index-settings/index-modules.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -72,6 +72,10 @@ Index mode supports the following values:
7272
`standard`
7373
: Standard indexing with default settings.
7474

75+
`lookup`
76+
: Index that can be used for lookup joins in ES|QL. Limited to 1 shard.
77+
78+
7579
`time_series`
7680
: *(data streams only)* Index mode optimized for storage of metrics. For more information, see [Time series index settings](time-series.md).
7781

docs/reference/query-languages/esql/esql-commands.md

Lines changed: 81 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,6 @@ mapped_pages:
66

77
# {{esql}} commands [esql-commands]
88

9-
109
## Source commands [esql-source-commands]
1110

1211
An {{esql}} source command produces a table, typically with data from {{es}}. An {{esql}} query must start with a source command.
@@ -39,6 +38,7 @@ An {{esql}} source command produces a table, typically with data from {{es}}. An
3938
* [`GROK`](#esql-grok)
4039
* [`KEEP`](#esql-keep)
4140
* [`LIMIT`](#esql-limit)
41+
* [preview] [`LOOKUP JOIN`](#esql-lookup-join)
4242
* [preview] [`MV_EXPAND`](#esql-mv_expand)
4343
* [`RENAME`](#esql-rename)
4444
* [`SORT`](#esql-sort)
@@ -663,6 +663,86 @@ FROM employees
663663
| LIMIT 5
664664
```
665665

666+
## `LOOKUP JOIN` [esql-lookup-join]
667+
668+
::::{warning}
669+
This functionality is in technical preview and may be changed or removed in a future release. Elastic will work to fix any issues, but features in technical preview are not subject to the support SLA of official GA features.
670+
::::
671+
672+
`LOOKUP JOIN` enables you to add data from another index, AKA a 'lookup' index, to your {{esql}} query results, simplifying data enrichment and analysis workflows.
673+
674+
**Syntax**
675+
676+
```
677+
FROM <source_index>
678+
| LOOKUP JOIN <lookup_index> ON <field_name>
679+
```
680+
681+
```esql
682+
FROM firewall_logs
683+
| LOOKUP JOIN threat_list ON source.IP
684+
| WHERE threat_level IS NOT NULL
685+
```
686+
687+
**Parameters**
688+
689+
`<lookup_index>`
690+
: The name of the lookup index. This must be a specific index name - wildcards, aliases, and remote cluster references are not supported.
691+
692+
`<field_name>`
693+
: The field to join on. This field must exist in both your current query results and in the lookup index. If the field contains multi-valued entries, those entries will not match anything (the added fields will contain `null` for those rows).
694+
695+
**Description**
696+
697+
The `LOOKUP JOIN` command adds new columns to your {esql} query results table by finding documents in a lookup index that share the same join field value as your result rows.
698+
699+
For each row in your results table that matches a document in the lookup index based on the join field, all fields from the matching document are added as new columns to that row.
700+
701+
If multiple documents in the lookup index match a single row in your results, the output will contain one row for each matching combination.
702+
703+
**Examples**
704+
705+
::::{tip}
706+
In case of name collisions, the newly created columns will override existing columns.
707+
::::
708+
709+
**IP Threat correlation**: This query would allow you to see if any source IPs match known malicious addresses.
710+
711+
```esql
712+
FROM firewall_logs
713+
| LOOKUP JOIN threat_list ON source.IP
714+
```
715+
716+
**Host metadata correlation**: This query pulls in environment or ownership details for each host to correlate with your metrics data.
717+
718+
```esql
719+
FROM system_metrics
720+
| LOOKUP JOIN host_inventory ON host.name
721+
| LOOKUP JOIN employees ON host.name
722+
```
723+
724+
**Service ownership mapping**: This query would show logs with the owning team or escalation information for faster triage and incident response.
725+
726+
```esql
727+
FROM app_logs
728+
| LOOKUP JOIN service_owners ON service_id
729+
```
730+
731+
`LOOKUP JOIN` is generally faster when there are fewer rows to join with. {{esql}} will try and perform any `WHERE` clause before the `LOOKUP JOIN` where possible.
732+
733+
The two following examples will have the same results. The two examples have the `WHERE` clause before and after the `LOOKUP JOIN`. It does not matter how you write your query, our optimizer will move the filter before the lookup when possible.
734+
735+
```esql
736+
FROM Left
737+
| WHERE Language IS NOT NULL
738+
| LOOKUP JOIN Right ON Key
739+
```
740+
741+
```esql
742+
FROM Left
743+
| LOOKUP JOIN Right ON Key
744+
| WHERE Language IS NOT NULL
745+
```
666746

667747
## `MV_EXPAND` [esql-mv_expand]
668748

docs/reference/query-languages/esql/esql-enrich-data.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,14 @@ For example, you can use `ENRICH` to:
1515
* Add product information to retail orders based on product IDs
1616
* Supplement contact information based on an email address
1717

18+
[`ENRICH`](/reference/query-languages/esql/esql-commands.md#esql-enrich) is similar to [`LOOKUP join`](/reference/query-languages/esql/esql-commands.md#esql-lookup-join) in the fact that they both help you join data together. You should use `ENRICH` when:
19+
20+
* Enrichment data doesn't change frequently
21+
* You can accept index-time overhead
22+
* You are working with structured enrichment patterns
23+
* You can accept having multiple matches combined into multi-values
24+
* You can accept being limited to predefined match fields
25+
* `ENRICH` has a simplified security model. There are no restirctions to specific enrich policies or document and field level security.
1826

1927
### How the `ENRICH` command works [esql-how-enrich-works]
2028

Lines changed: 128 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,128 @@
1+
---
2+
navigation_title: "Correlate data with LOOKUP JOIN"
3+
mapped_pages:
4+
- https://www.elastic.co/guide/en/elasticsearch/reference/current/esql-enrich-data.html
5+
---
6+
7+
# LOOKUP JOIN [esql-lookup-join-reference]
8+
9+
The {{esql}} [`LOOKUP JOIN`](/reference/query-languages/esql/esql-commands.md#esql-lookup-join) processing command combines data from your {esql} query results table with matching records from a specified lookup index. It adds fields from the lookup index as new columns to your results table based on matching values in the join field.
10+
11+
Teams often have data scattered across multiple indices – like logs, IPs, user IDs, hosts, employees etc. Without a direct way to enrich or correlate each event with reference data, root-cause analysis, security checks, and operational insights become time-consuming.
12+
13+
For example, you can use `LOOKUP JOIN` to:
14+
15+
* Retrieve environment or ownership details for each host to correlate your metrics data.
16+
* Quickly see if any source IPs match known malicious addresses.
17+
* Tag logs with the owning team or escalation info for faster triage and incident response.
18+
19+
[`LOOKUP join`](/reference/query-languages/esql/esql-commands.md#esql-lookup-join) is similar to [`ENRICH`](/reference/query-languages/esql/esql-commands.md#esql-enrich) in the fact that they both help you join data together. You should use `LOOKUP JOIN` when:
20+
21+
* Your enrichment data changes frequently
22+
* You want to avoid index-time processing
23+
* You're working with regular indices
24+
* You need to preserve distinct matches
25+
* You need to match on any field in a lookup index
26+
* You use document or field level security
27+
* You want to restrict users to a specific lookup indices that they can you
28+
29+
## How the `LOOKUP JOIN` command works [esql-how-lookup-join-works]
30+
31+
The `LOOKUP JOIN` command adds new columns to a table, with data from {{es}} indices.
32+
33+
:::{image} ../../../images/esql-lookup-join.png
34+
:alt: esql lookup join
35+
:::
36+
37+
`<lookup_index>`
38+
: The name of the lookup index. This must be a specific index name - wildcards, aliases, and remote cluster references are not supported.
39+
40+
`<field_name>`
41+
: The field to join on. This field must exist in both your current query results and in the lookup index. If the field contains multi-valued entries, those entries will not match anything (the added fields will contain `null` for those rows).
42+
43+
## Example
44+
45+
`LOOKUP JOIN` has left-join behavior. If no rows match in the looked index, `LOOKUP JOIN` retains the incoming row and adds `null`s. If many rows in the lookedup index match, `LOOKUP JOIN` adds one row per match.
46+
47+
In this example, we have two sample tables:
48+
49+
**employees**
50+
51+
| birth_date|emp_no|first_name|gender|hire_date|language|
52+
|---|---|---|---|---|---|
53+
|1955-10-04T00:00:00Z|10091|Amabile |M|1992-11-18T00:00:00Z|3|
54+
|1964-10-18T00:00:00Z|10092|Valdiodio |F|1989-09-22T00:00:00Z|1|
55+
|1964-06-11T00:00:00Z|10093|Sailaja |M|1996-11-05T00:00:00Z|3|
56+
|1957-05-25T00:00:00Z|10094|Arumugam |F|1987-04-18T00:00:00Z|5|
57+
|1965-01-03T00:00:00Z|10095|Hilari |M|1986-07-15T00:00:00Z|4|
58+
59+
**languages_non_unique_key**
60+
61+
|language_code|language_name|country|
62+
|---|---|---|
63+
|1|English|Canada|
64+
|1|English|
65+
|1||United Kingdom|
66+
|1|English|United States of America|
67+
|2|German|[Germany\|Austria]|
68+
|2|German|Switzerland|
69+
|2|German|
70+
|4|Spanish|
71+
|5||France|
72+
|[6\|7]|Mv-Lang|Mv-Land|
73+
|[7\|8]|Mv-Lang2|Mv-Land2|
74+
||Null-Lang|Null-Land|
75+
||Null-Lang2|Null-Land2|
76+
77+
Running the following query would provide the results shown below.
78+
79+
```esql
80+
FROM employees
81+
| EVAL language_code = emp_no % 10
82+
| LOOKUP JOIN languages_lookup_non_unique_key ON language_code
83+
| WHERE emp_no > 10090 AND emp_no < 10096
84+
| SORT emp_no, country
85+
| KEEP emp_no, language_code, language_name, country;
86+
```
87+
88+
|emp_no|language_code|language_name|country|
89+
|---|---|---|---|
90+
| 10091 | 1 | English | Canada|
91+
| 10091 | 1 | null | United Kingdom|
92+
| 10091 | 1 | English | United States of America|
93+
| 10091 | 1 | English | null|
94+
| 10092 | 2 | German | [Germany, Austria]|
95+
| 10092 | 2 | German | Switzerland|
96+
| 10092 | 2 | German | null|
97+
| 10093 | 3 | null | null|
98+
| 10094 | 4 | Spanish | null|
99+
| 10095 | 5 | null | France|
100+
101+
::::{important}
102+
`LOOKUP JOIN` does not guarantee the output to be in any particular order. If a certain order is required, users should use a [`SORT`](/reference/query-languages/esql/esql-commands.md#esql-sort) somewhere after the `LOOKUP JOIN`.
103+
104+
::::
105+
106+
## Prerequisites [esql-lookup-join-prereqs]
107+
108+
To use `LOOKUP JOIN`, the following requirements must be met:
109+
110+
* **Compatible data types**: The join key and join field in the lookup index must have compatible data types. This means:
111+
* The data types must either be identical or be internally represented as the same type in Elasticsearch's type system
112+
* Numeric types follow these compatibility rules:
113+
* `short` and `byte` are compatible with `integer` (all represented as `int`)
114+
* `float`, `half_float`, and `scaled_float` are compatible with `double` (all represented as `double`)
115+
* For text fields: You can use text fields on the left-hand side of the join only if they have a `.keyword` subfield
116+
117+
For a complete list of supported data types and their internal representations, see the [Supported Field Types documentation](/reference/query-languages/esql/limitations.md#_supported_types).
118+
119+
## Limitations
120+
121+
The following are the current limitations with `LOOKUP JOIN`
122+
123+
* `LOOKUP JOIN` will be successful if the join field in the lookup index is a `KEYWORD` type. If the main index's join field is `TEXT` type, it must have an exact `.keyword` subfield that can be matched with the lookup index's `KEYWORD` field.
124+
* Indices in [lookup](/reference/elasticsearch/index-settings/index-modules.md#index-mode-setting) mode are always single-sharded.
125+
* Cross cluster search is unsupported. Both source and lookup indices must be local.
126+
* `LOOKUP JOIN` can only use a single match field and a single index. Wildcards, aliases, datemath, and datastreams are not supported.
127+
* The name of the match field in `LOOKUP JOIN lu_idx ON match_field` must match an existing field in the query. This may require renames or evals to achieve.
128+
* The query will circuit break if there are too many matching documents in the lookup index, or if the documents are too large. More precisely, `LOOKUP JOIN` works in batches of, normally, about 10,000 rows; a large amount of heap space is needed if the matching documents from the lookup index for a batch are multiple megabytes or larger. This is roughly the same as for `ENRICH`.

docs/reference/query-languages/toc.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -91,6 +91,7 @@ toc:
9191
- file: query-languages/esql/esql-multivalued-fields.md
9292
- file: query-languages/esql/esql-process-data-with-dissect-grok.md
9393
- file: query-languages/esql/esql-enrich-data.md
94+
- file: query-languages/esql/esql-lookup-join.md
9495
- file: query-languages/esql/esql-implicit-casting.md
9596
- file: query-languages/esql/esql-time-spans.md
9697
- file: query-languages/esql/limitations.md

docs/reference/toc.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -509,6 +509,7 @@ toc:
509509
- file: query-languages/esql/esql-multivalued-fields.md
510510
- file: query-languages/esql/esql-process-data-with-dissect-grok.md
511511
- file: query-languages/esql/esql-enrich-data.md
512+
- file: query-languages/esql/esql-lookup-join.md
512513
- file: query-languages/esql/esql-implicit-casting.md
513514
- file: query-languages/esql/esql-time-spans.md
514515
- file: query-languages/esql/limitations.md

0 commit comments

Comments
 (0)