Skip to content

Commit f0109c3

Browse files
lookup join for 8.18 (elastic#124760) (elastic#125336)
* lookup join for 8.18 (cherry picked from commit 134b8f8) Co-authored-by: George Wallace <[email protected]>
1 parent e946f24 commit f0109c3

File tree

7 files changed

+306
-70
lines changed

7 files changed

+306
-70
lines changed

docs/reference/esql/esql-commands.asciidoc

Lines changed: 2 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -42,9 +42,7 @@ ifeval::["{release-state}"=="unreleased"]
4242
endif::[]
4343
* <<esql-keep>>
4444
* <<esql-limit>>
45-
ifeval::["{release-state}"=="unreleased"]
46-
//* experimental:[] <<esql-lookup>>
47-
endif::[]
45+
* experimental:[] <<esql-lookup-join>>
4846
* experimental:[] <<esql-mv_expand>>
4947
* <<esql-rename>>
5048
* <<esql-sort>>
@@ -67,9 +65,7 @@ ifeval::["{release-state}"=="unreleased"]
6765
endif::[]
6866
include::processing-commands/keep.asciidoc[]
6967
include::processing-commands/limit.asciidoc[]
70-
ifeval::["{release-state}"=="unreleased"]
71-
//include::processing-commands/lookup.asciidoc[]
72-
endif::[]
68+
include::processing-commands/lookup.asciidoc[]
7369
include::processing-commands/mv_expand.asciidoc[]
7470
include::processing-commands/rename.asciidoc[]
7571
include::processing-commands/sort.asciidoc[]

docs/reference/esql/esql-language.asciidoc

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@ Detailed reference documentation for the {esql} language:
1212
* <<esql-metadata-fields>>
1313
* <<esql-multivalued-fields>>
1414
* <<esql-enrich-data>>
15+
* <<esql-lookup-join>>
1516
* <<esql-process-data-with-dissect-and-grok>>
1617
* <<esql-implicit-casting>>
1718
* <<esql-time-spans>>
@@ -23,5 +24,6 @@ include::metadata-fields.asciidoc[]
2324
include::multivalued-fields.asciidoc[]
2425
include::esql-process-data-with-dissect-grok.asciidoc[]
2526
include::esql-enrich-data.asciidoc[]
27+
include::esql-lookup-join.asciidoc[]
2628
include::implicit-casting.asciidoc[]
2729
include::time-spans.asciidoc[]
Lines changed: 189 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,189 @@
1+
=== LOOKUP JOIN
2+
3+
++++
4+
<titleabbrev>Correlate data with LOOKUP JOIN</titleabbrev>
5+
++++
6+
7+
The {esql} <<esql-lookup-join,LOOKUP join>>
8+
processing command combines data from your {esql} query results
9+
table with matching records from a specified lookup index. It adds
10+
fields from the lookup index as new columns to your results table based
11+
on matching values in the join field.
12+
13+
Teams often have data scattered across multiple indices – like logs,
14+
IPs, user IDs, hosts, employees etc. Without a direct way to enrich or
15+
correlate each event with reference data, root-cause analysis, security
16+
checks, and operational insights become time-consuming.
17+
18+
For example, you can use `LOOKUP JOIN` to:
19+
20+
* Retrieve environment or ownership details for each host to correlate
21+
your metrics data.
22+
* Quickly see if any source IPs match known malicious addresses.
23+
* Tag logs with the owning team or escalation info for faster triage and
24+
incident response.
25+
26+
<<esql-lookup-join,LOOKUP join>> is similar to <<esql-enrich-data,ENRICH>>
27+
in the fact that they both help you join data together. You should use
28+
`LOOKUP JOIN` when:
29+
30+
* Your enrichment data changes frequently
31+
* You want to avoid index-time processing
32+
* You're working with regular indices
33+
* You need to preserve distinct matches
34+
* You need to match on any field in a lookup index
35+
* You use document or field level security
36+
* You want to restrict users to a specific lookup indices that they can
37+
you
38+
39+
[discrete]
40+
[[esql-how-lookup-join-works]]
41+
==== How the `LOOKUP JOIN` command works ++[++esql-how-lookup-join-works++]++
42+
43+
The `LOOKUP JOIN` command adds new columns to a table, with data from
44+
{es} indices.
45+
46+
image::images/esql/esql-lookup-join.png[align="center"]
47+
48+
[[esql-lookup-join-lookup-index]]
49+
lookup_index::
50+
The name of the lookup index. This must
51+
be a specific index name - wildcards, aliases, and remote cluster
52+
references are not supported.
53+
54+
[[esql-lookup-join-field-name]]
55+
field_name::
56+
The field to join on. This field must exist
57+
in both your current query results and in the lookup index. If the field
58+
contains multi-valued entries, those entries will not match anything
59+
(the added fields will contain `null` for those rows).
60+
61+
[discrete]
62+
[[esql-lookup-join-example]]
63+
==== Example
64+
65+
`LOOKUP JOIN` has left-join behavior. If no rows match in the looked index, `LOOKUP JOIN` retains the incoming row and adds `null`s. If many rows in the lookedup index match, `LOOKUP JOIN` adds one row per match.
66+
67+
In this example, we have two sample tables:
68+
69+
*employees*
70+
71+
[cols=",,,,,",options="header",]
72+
|===
73+
|birth++_++date |emp++_++no |first++_++name |gender |hire++_++date
74+
|language
75+
|1955-10-04T00:00:00Z |10091 |Amabile |M |1992-11-18T00:00:00Z |3
76+
77+
|1964-10-18T00:00:00Z |10092 |Valdiodio |F |1989-09-22T00:00:00Z |1
78+
79+
|1964-06-11T00:00:00Z |10093 |Sailaja |M |1996-11-05T00:00:00Z |3
80+
81+
|1957-05-25T00:00:00Z |10094 |Arumugam |F |1987-04-18T00:00:00Z |5
82+
83+
|1965-01-03T00:00:00Z |10095 |Hilari |M |1986-07-15T00:00:00Z |4
84+
|===
85+
86+
*languages++_++non++_++unique++_++key*
87+
88+
[cols=",,",options="header",]
89+
|===
90+
|language++_++code |language++_++name |country
91+
|1 |English |Canada
92+
|1 |English |
93+
|1 | |United Kingdom
94+
|1 |English |United States of America
95+
|2 |German |++[++Germany{vbar}Austria++]++
96+
|2 |German |Switzerland
97+
|2 |German |
98+
|4 |Quenya |
99+
|5 | |Atlantis
100+
|++[++6{vbar}7++]++ |Mv-Lang |Mv-Land
101+
|++[++7{vbar}8++]++ |Mv-Lang2 |Mv-Land2
102+
|Null-Lang |Null-Land |
103+
|Null-Lang2 |Null-Land2 |
104+
|===
105+
106+
Running the following query would provide the results shown below.
107+
108+
[source,esql]
109+
----
110+
FROM employees
111+
| EVAL language_code = emp_no % 10
112+
| LOOKUP JOIN languages_lookup_non_unique_key ON language_code
113+
| WHERE emp_no > 10090 AND emp_no < 10096
114+
| SORT emp_no, country
115+
| KEEP emp_no, language_code, language_name, country;
116+
----
117+
118+
[cols=",,,",options="header",]
119+
|===
120+
|emp++_++no |language++_++code |language++_++name |country
121+
|10091 |1 |English |Canada
122+
|10091 |1 |null |United Kingdom
123+
|10091 |1 |English |United States of America
124+
|10091 |1 |English |null
125+
|10092 |2 |German |++[++Germany, Austria++]++
126+
|10092 |2 |German |Switzerland
127+
|10092 |2 |German |null
128+
|10093 |3 |null |null
129+
|10094 |4 |Spanish |null
130+
|10095 |5 |null |France
131+
|===
132+
133+
[IMPORTANT]
134+
====
135+
`LOOKUP JOIN` does not guarantee the output to be in
136+
any particular order. If a certain order is required, users should use a
137+
link:/reference/query-languages/esql/esql-commands.md#esql-sort[`SORT`]
138+
somewhere after the `LOOKUP JOIN`.
139+
====
140+
141+
[discrete]
142+
[[esql-lookup-join-prereqs]]
143+
==== Prerequisites
144+
145+
To use `LOOKUP JOIN`, the following requirements must be met:
146+
147+
* *Compatible data types*: The join key and join field in the lookup
148+
index must have compatible data types. This means:
149+
** The data types must either be identical or be internally represented
150+
as the same type in Elasticsearch's type system
151+
** Numeric types follow these compatibility rules:
152+
*** `short` and `byte` are compatible with `integer` (all represented as
153+
`int`)
154+
*** `float`, `half_float`, and `scaled_float` are compatible
155+
with `double` (all represented as `double`)
156+
** For text fields: You can use text fields on the left-hand side of the
157+
join only if they have a `.keyword` subfield
158+
159+
For a complete list of supported data types and their internal
160+
representations, see the
161+
link:/reference/query-languages/esql/limitations.md#_supported_types[Supported
162+
Field Types documentation].
163+
164+
[discrete]
165+
[[esql-lookup-join-limitations]]
166+
==== Limitations
167+
168+
The following are the current limitations with `LOOKUP JOIN`
169+
170+
* `LOOKUP JOIN` will be successful if the join field in the lookup index
171+
is a `KEYWORD` type. If the main index's join field is `TEXT` type, it
172+
must have an exact `.keyword` subfield that can be matched with the
173+
lookup index's `KEYWORD` field.
174+
* Indices in
175+
link:/reference/elasticsearch/index-settings/index-modules.md#index-mode-setting[lookup]
176+
mode are always single-sharded.
177+
* Cross cluster search is unsupported. Both source and lookup indices
178+
must be local.
179+
* `LOOKUP JOIN` can only use a single match field and a single index.
180+
Wildcards, aliases, datemath, and datastreams are not supported.
181+
* The name of the match field in
182+
`LOOKUP JOIN lu++_++idx ON match++_++field` must match an existing field
183+
in the query. This may require renames or evals to achieve.
184+
* The query will circuit break if there are too many matching documents
185+
in the lookup index, or if the documents are too large. More precisely,
186+
`LOOKUP JOIN` works in batches of, normally, about 10,000 rows; a large
187+
amount of heap space is needed if the matching documents from the lookup
188+
index for a batch are multiple megabytes or larger. This is roughly the
189+
same as for `ENRICH`.
Lines changed: 111 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,111 @@
1+
[discrete]
2+
[[esql-lookup-join]]
3+
=== `LOOKUP JOIN`
4+
5+
[WARNING]
6+
====
7+
This functionality is in technical preview and may be
8+
changed or removed in a future release. Elastic will work to fix any
9+
issues, but features in technical preview are not subject to the support
10+
SLA of official GA features. ::::
11+
====
12+
`LOOKUP JOIN` enables you to add data from another index, AKA a 'lookup'
13+
index, to your ++{{++esql}} query results, simplifying data enrichment
14+
and analysis workflows.
15+
16+
*Syntax*
17+
18+
....
19+
FROM <source_index>
20+
| LOOKUP JOIN <lookup_index> ON <field_name>
21+
....
22+
23+
[source,esql]
24+
----
25+
FROM firewall_logs
26+
| LOOKUP JOIN threat_list ON source.IP
27+
| WHERE threat_level IS NOT NULL
28+
----
29+
30+
*Parameters*
31+
32+
`lookup_index`::
33+
The name of the lookup index. This must be a specific index name - wildcards, aliases, and remote cluster
34+
references are not supported.
35+
36+
`field_name`::
37+
The field to join on. This field must exist
38+
in both your current query results and in the lookup index. If the field
39+
contains multi-valued entries, those entries will not match anything
40+
(the added fields will contain `null` for those rows).
41+
42+
*Description*
43+
44+
The `LOOKUP JOIN` command adds new columns to your ++{++esql} query
45+
results table by finding documents in a lookup index that share the same
46+
join field value as your result rows.
47+
48+
For each row in your results table that matches a document in the lookup
49+
index based on the join field, all fields from the matching document are
50+
added as new columns to that row.
51+
52+
If multiple documents in the lookup index match a single row in your
53+
results, the output will contain one row for each matching combination.
54+
55+
*Examples*
56+
57+
[TIP]
58+
====
59+
In case of name collisions, the newly created columns will override existing columns.
60+
====
61+
62+
*IP Threat correlation*: This query would allow you to see if any source
63+
IPs match known malicious addresses.
64+
65+
[source,esql]
66+
----
67+
FROM firewall_logs
68+
| LOOKUP JOIN threat_list ON source.IP
69+
----
70+
71+
*Host metadata correlation*: This query pulls in environment or
72+
ownership details for each host to correlate with your metrics data.
73+
74+
[source,esql]
75+
----
76+
FROM system_metrics
77+
| LOOKUP JOIN host_inventory ON host.name
78+
| LOOKUP JOIN employees ON host.name
79+
----
80+
81+
*Service ownership mapping*: This query would show logs with the owning
82+
team or escalation information for faster triage and incident response.
83+
84+
[source,esql]
85+
----
86+
FROM app_logs
87+
| LOOKUP JOIN service_owners ON service_id
88+
----
89+
90+
`LOOKUP JOIN` is generally faster when there are fewer rows to join
91+
with. {esql} will try and perform any `WHERE` clause before the
92+
`LOOKUP JOIN` where possible.
93+
94+
The two following examples will have the same results. The two examples
95+
have the `WHERE` clause before and after the `LOOKUP JOIN`. It does not
96+
matter how you write your query, our optimizer will move the filter
97+
before the lookup when ran.
98+
99+
[source,esql]
100+
----
101+
FROM Left
102+
| WHERE Language IS NOT NULL
103+
| LOOKUP JOIN Right ON Key
104+
----
105+
106+
[source,esql]
107+
----
108+
FROM Left
109+
| LOOKUP JOIN Right ON Key
110+
| WHERE Language IS NOT NULL
111+
----

0 commit comments

Comments
 (0)