From 1439f589c5048c7862b761561935c735bb850b4c Mon Sep 17 00:00:00 2001 From: Liam Thompson <32779855+leemthompo@users.noreply.github.com> Date: Thu, 24 Apr 2025 15:24:04 +0200 Subject: [PATCH] [DOCS][ESQL][8.x] Cleanup and cross-reference LOOKUP JOIN reference and landing pages (#127316) * [DOCS][ESQL][8.x] Cleanup and cross-reference LOOKUP JOIN reference and landing pages * Add missing id to fix linking problem --- docs/reference/esql/esql-lookup-join.asciidoc | 195 ++++++++++++------ .../esql/processing-commands/lookup.asciidoc | 6 +- 2 files changed, 136 insertions(+), 65 deletions(-) diff --git a/docs/reference/esql/esql-lookup-join.asciidoc b/docs/reference/esql/esql-lookup-join.asciidoc index 2dcf927d27dca..cd03bff8364d7 100644 --- a/docs/reference/esql/esql-lookup-join.asciidoc +++ b/docs/reference/esql/esql-lookup-join.asciidoc @@ -1,9 +1,19 @@ === LOOKUP JOIN - ++++ Correlate data with LOOKUP JOIN ++++ +// hack because page didn't have explicit id originally we could link to using internal link syntax +[[esql-lookup-join-landing-page]] + +[WARNING] +==== +This functionality is in technical preview and may be +changed or removed in a future release. Elastic will work to fix any +issues, but features in technical preview are not subject to the support +SLA of official GA features. +==== + The {esql} <> processing command combines data from your {esql} query results table with matching records from a specified lookup index. It adds @@ -23,6 +33,10 @@ your metrics data. * Tag logs with the owning team or escalation info for faster triage and incident response. +[discrete] +[[esql-compare-with-enrich]] +==== Compare with ENRICH + <> is similar to <> in the fact that they both help you join data together. You should use `LOOKUP JOIN` when: @@ -37,12 +51,17 @@ in the fact that they both help you join data together. You should use [discrete] [[esql-how-lookup-join-works]] -==== How the `LOOKUP JOIN` command works +==== How the command works -The `LOOKUP JOIN` command adds new columns to a table, with data from -{es} indices. +The `LOOKUP JOIN` command adds fields from the lookup index as new columns +to your results table based on matching values in the join field. -image::images/esql/esql-lookup-join.png[align="center"] +[source,esql] +---- +LOOKUP JOIN ON +---- + +The command requires two parameters: [[esql-lookup-join-lookup-index]] lookup_index:: @@ -50,7 +69,6 @@ The name of the lookup index. This must be a specific index name - wildcards, aliases, and remote cluster references are not supported. Indices used for lookups must be configured with the <>. - [[esql-lookup-join-field-name]] field_name:: The field to join on. This field must exist @@ -58,84 +76,135 @@ in both your current query results and in the lookup index. If the field contains multi-valued entries, those entries will not match anything (the added fields will contain `null` for those rows). +image::images/esql/esql-lookup-join.png[align="center"] + +If you're familiar with SQL, `LOOKUP JOIN` has left-join behavior. This means that +if no rows match in the lookup index, the incoming row is retained and `null`s are added. If many rows in the lookup index match, `LOOKUP JOIN` adds one row per match. + [discrete] [[esql-lookup-join-example]] ==== Example -`LOOKUP JOIN` has left-join behavior. If no rows match in the lookup index, `LOOKUP JOIN` retains the incoming row and adds nulls. If many rows in the lookup index match, `LOOKUP JOIN` adds one row per match. +You can run this example for yourself to see how it works by setting up the indices and adding sample data. Otherwise, you just inspect the query and response. -In this example, we have two sample tables: +[discrete] +[[esql-lookup-join-example-setup-sample-data]] +===== Sample data -*employees* +.*Expand for setup instructions* +[%collapsible] +============== -[cols=",,,,,",options="header",] -|=== -|birth++_++date |emp++_++no |first++_++name |gender |hire++_++date -|language -|1955-10-04T00:00:00Z |10091 |Amabile |M |1992-11-18T00:00:00Z |3 +**Set up indices** -|1964-10-18T00:00:00Z |10092 |Valdiodio |F |1989-09-22T00:00:00Z |1 +First, let's create two indices with mappings: `threat_list` and `firewall_logs`. + +[source,console] +---- +PUT threat_list +{ + "settings": { + "index.mode": "lookup" <1> + }, + "mappings": { + "properties": { + "source.ip": { "type": "ip" }, + "threat_level": { "type": "keyword" }, + "threat_type": { "type": "keyword" }, + "last_updated": { "type": "date" } + } + } +} +---- +<1> The lookup index must be set up using this mode -|1964-06-11T00:00:00Z |10093 |Sailaja |M |1996-11-05T00:00:00Z |3 +[source,console] +---- +PUT firewall_logs +{ + "mappings": { + "properties": { + "timestamp": { "type": "date" }, + "source.ip": { "type": "ip" }, + "destination.ip": { "type": "ip" }, + "action": { "type": "keyword" }, + "bytes_transferred": { "type": "long" } + } + } +} +---- -|1957-05-25T00:00:00Z |10094 |Arumugam |F |1987-04-18T00:00:00Z |5 +*Add sample data* -|1965-01-03T00:00:00Z |10095 |Hilari |M |1986-07-15T00:00:00Z |4 -|=== +Next, let's add some sample data to both indices. The `threat_list` index contains known malicious IPs, while the `firewall_logs` index contains logs of network traffic. -*languages++_++non++_++unique++_++key* +[source,console] +---- +POST threat_list/_bulk +{"index":{}} +{"source.ip":"203.0.113.5","threat_level":"high","threat_type":"C2_SERVER","last_updated":"2025-04-22"} +{"index":{}} +{"source.ip":"198.51.100.2","threat_level":"medium","threat_type":"SCANNER","last_updated":"2025-04-23"} +---- -[cols=",,",options="header",] -|=== -|language++_++code |language++_++name |country -|1 |English |Canada -|1 |English | -|1 | |United Kingdom -|1 |English |United States of America -|2 |German |++[++Germany{vbar}Austria++]++ -|2 |German |Switzerland -|2 |German | -|4 |Quenya | -|5 | |Atlantis -|++[++6{vbar}7++]++ |Mv-Lang |Mv-Land -|++[++7{vbar}8++]++ |Mv-Lang2 |Mv-Land2 -|Null-Lang |Null-Land | -|Null-Lang2 |Null-Land2 | -|=== +[source,console] +---- +POST firewall_logs/_bulk +{"index":{}} +{"timestamp":"2025-04-23T10:00:01Z","source.ip":"192.0.2.1","destination.ip":"10.0.0.100","action":"allow","bytes_transferred":1024} +{"index":{}} +{"timestamp":"2025-04-23T10:00:05Z","source.ip":"203.0.113.5","destination.ip":"10.0.0.55","action":"allow","bytes_transferred":2048} +{"index":{}} +{"timestamp":"2025-04-23T10:00:08Z","source.ip":"198.51.100.2","destination.ip":"10.0.0.200","action":"block","bytes_transferred":0} +{"index":{}} +{"timestamp":"2025-04-23T10:00:15Z","source.ip":"203.0.113.5","destination.ip":"10.0.0.44","action":"allow","bytes_transferred":4096} +{"index":{}} +{"timestamp":"2025-04-23T10:00:30Z","source.ip":"192.0.2.1","destination.ip":"10.0.0.100","action":"allow","bytes_transferred":512} +---- +============== -Running the following query would provide the results shown below. +[discrete] +[[esql-lookup-join-example-query]] +===== Query the Data [source,esql] ---- -FROM employees -| EVAL language_code = emp_no % 10 -| LOOKUP JOIN languages_lookup_non_unique_key ON language_code -| WHERE emp_no > 10090 AND emp_no < 10096 -| SORT emp_no, country -| KEEP emp_no, language_code, language_name, country; +FROM firewall_logs <1> +| LOOKUP JOIN threat_list ON source.ip <2> +| WHERE threat_level IS NOT NULL <3> +| SORT timestamp <4> +| KEEP source.ip, action, threat_level, threat_type <5> +| LIMIT 10 <6> ---- -[cols=",,,",options="header",] +<1> The source index +<2> The lookup index and join field +<3> Filter for rows with non-null threat levels +<4> LOOKUP JOIN does not guarantee output order, so you must explicitly sort +<5> Keep only relevant fields +<6> Limit the output to 10 rows + +[discrete] +[[esql-lookup-join-example-response]] +===== Response + +A successful query will output a table like this: + +[cols="4*",options="header"] |=== -|emp++_++no |language++_++code |language++_++name |country -|10091 |1 |English |Canada -|10091 |1 |null |United Kingdom -|10091 |1 |English |United States of America -|10091 |1 |English |null -|10092 |2 |German |++[++Germany, Austria++]++ -|10092 |2 |German |Switzerland -|10092 |2 |German |null -|10093 |3 |null |null -|10094 |4 |Spanish |null -|10095 |5 |null |France +|source.ip |action |threat_type |threat_level +|203.0.113.5 |allow |C2_SERVER |high +|198.51.100.2 |block |SCANNER |medium +|203.0.113.5 |allow |C2_SERVER |high |=== -[IMPORTANT] -==== -`LOOKUP JOIN` does not guarantee the output to be in -any particular order. If a certain order is required, users should use a -<> somewhere after the `LOOKUP JOIN`. -==== +In this example, you can see that the `source.ip` field from the `firewall_logs` index is matched with the `source.ip` field in the `threat_list` index, and the corresponding `threat_level` and `threat_type` fields are added to the output. + +[discrete] +[[esql-lookup-join-additional-examples]] +===== Additional examples + +Refer to the examples section of the <> command reference for more examples. [discrete] [[esql-lookup-join-prereqs]] @@ -182,4 +251,4 @@ in the lookup index, or if the documents are too large. More precisely, `LOOKUP JOIN` works in batches of, normally, about 10,000 rows; a large amount of heap space is needed if the matching documents from the lookup index for a batch are multiple megabytes or larger. This is roughly the -same as for `ENRICH`. +same as for `ENRICH`. \ No newline at end of file diff --git a/docs/reference/esql/processing-commands/lookup.asciidoc b/docs/reference/esql/processing-commands/lookup.asciidoc index cde5130a68815..ed05158422166 100644 --- a/docs/reference/esql/processing-commands/lookup.asciidoc +++ b/docs/reference/esql/processing-commands/lookup.asciidoc @@ -9,10 +9,13 @@ changed or removed in a future release. Elastic will work to fix any issues, but features in technical preview are not subject to the support SLA of official GA features. ==== + `LOOKUP JOIN` enables you to add data from another index, AKA a 'lookup' index, to your {esql} query results, simplifying data enrichment and analysis workflows. +See <> for an overview of the `LOOKUP JOIN` command, including use cases, prerequisites, and current limitations. + *Syntax* [source,esql] @@ -24,8 +27,7 @@ FROM *Parameters* `lookup_index`:: -The name of the lookup index. This must be a specific index name - wildcards, aliases, and remote cluster -references are not supported. +The name of the lookup index. This must be a specific index name - wildcards, aliases, and remote cluster references are not supported. Indices used for lookups must be configured with the `lookup` <>. `field_name`:: The field to join on. This field must exist