|
| 1 | +=== LOOKUP JOIN |
| 2 | + |
| 3 | +++++ |
| 4 | +<titleabbrev>Correlate data with LOOKUP JOIN</titleabbrev> |
| 5 | +++++ |
| 6 | + |
| 7 | +The {esql} <<esql-lookup-join,LOOKUP join>> |
| 8 | +processing command combines data from your {esql} query results |
| 9 | +table with matching records from a specified lookup index. It adds |
| 10 | +fields from the lookup index as new columns to your results table based |
| 11 | +on matching values in the join field. |
| 12 | + |
| 13 | +Teams often have data scattered across multiple indices – like logs, |
| 14 | +IPs, user IDs, hosts, employees etc. Without a direct way to enrich or |
| 15 | +correlate each event with reference data, root-cause analysis, security |
| 16 | +checks, and operational insights become time-consuming. |
| 17 | + |
| 18 | +For example, you can use `LOOKUP JOIN` to: |
| 19 | + |
| 20 | +* Retrieve environment or ownership details for each host to correlate |
| 21 | +your metrics data. |
| 22 | +* Quickly see if any source IPs match known malicious addresses. |
| 23 | +* Tag logs with the owning team or escalation info for faster triage and |
| 24 | +incident response. |
| 25 | +
|
| 26 | +<<esql-lookup-join,LOOKUP join>> is similar to <<esql-enrich-data,ENRICH>> |
| 27 | +in the fact that they both help you join data together. You should use |
| 28 | +`LOOKUP JOIN` when: |
| 29 | + |
| 30 | +* Your enrichment data changes frequently |
| 31 | +* You want to avoid index-time processing |
| 32 | +* You're working with regular indices |
| 33 | +* You need to preserve distinct matches |
| 34 | +* You need to match on any field in a lookup index |
| 35 | +* You use document or field level security |
| 36 | +* You want to restrict users to a specific lookup indices that they can |
| 37 | +you |
| 38 | +
|
| 39 | +[discrete] |
| 40 | +[[esql-how-lookup-join-works]] |
| 41 | +==== How the `LOOKUP JOIN` command works ++[++esql-how-lookup-join-works++]++ |
| 42 | + |
| 43 | +The `LOOKUP JOIN` command adds new columns to a table, with data from |
| 44 | +{es} indices. |
| 45 | + |
| 46 | +image::images/esql/esql-lookup-join.png[align="center"] |
| 47 | + |
| 48 | +[[esql-lookup-join-lookup-index]] |
| 49 | +lookup_index:: |
| 50 | +The name of the lookup index. This must |
| 51 | +be a specific index name - wildcards, aliases, and remote cluster |
| 52 | +references are not supported. |
| 53 | + |
| 54 | +[[esql-lookup-join-field-name]] |
| 55 | +field_name:: |
| 56 | +The field to join on. This field must exist |
| 57 | +in both your current query results and in the lookup index. If the field |
| 58 | +contains multi-valued entries, those entries will not match anything |
| 59 | +(the added fields will contain `null` for those rows). |
| 60 | + |
| 61 | +[discrete] |
| 62 | +[[esql-lookup-join-example]] |
| 63 | +==== Example |
| 64 | + |
| 65 | +`LOOKUP JOIN` has left-join behavior. If no rows match in the looked index, `LOOKUP JOIN` retains the incoming row and adds `null`s. If many rows in the lookedup index match, `LOOKUP JOIN` adds one row per match. |
| 66 | + |
| 67 | +In this example, we have two sample tables: |
| 68 | + |
| 69 | +*employees* |
| 70 | + |
| 71 | +[cols=",,,,,",options="header",] |
| 72 | +|=== |
| 73 | +|birth++_++date |emp++_++no |first++_++name |gender |hire++_++date |
| 74 | +|language |
| 75 | +|1955-10-04T00:00:00Z |10091 |Amabile |M |1992-11-18T00:00:00Z |3 |
| 76 | + |
| 77 | +|1964-10-18T00:00:00Z |10092 |Valdiodio |F |1989-09-22T00:00:00Z |1 |
| 78 | + |
| 79 | +|1964-06-11T00:00:00Z |10093 |Sailaja |M |1996-11-05T00:00:00Z |3 |
| 80 | + |
| 81 | +|1957-05-25T00:00:00Z |10094 |Arumugam |F |1987-04-18T00:00:00Z |5 |
| 82 | + |
| 83 | +|1965-01-03T00:00:00Z |10095 |Hilari |M |1986-07-15T00:00:00Z |4 |
| 84 | +|=== |
| 85 | + |
| 86 | +*languages++_++non++_++unique++_++key* |
| 87 | + |
| 88 | +[cols=",,",options="header",] |
| 89 | +|=== |
| 90 | +|language++_++code |language++_++name |country |
| 91 | +|1 |English |Canada |
| 92 | +|1 |English | |
| 93 | +|1 | |United Kingdom |
| 94 | +|1 |English |United States of America |
| 95 | +|2 |German |++[++Germany{vbar}Austria++]++ |
| 96 | +|2 |German |Switzerland |
| 97 | +|2 |German | |
| 98 | +|4 |Quenya | |
| 99 | +|5 | |Atlantis |
| 100 | +|++[++6{vbar}7++]++ |Mv-Lang |Mv-Land |
| 101 | +|++[++7{vbar}8++]++ |Mv-Lang2 |Mv-Land2 |
| 102 | +|Null-Lang |Null-Land | |
| 103 | +|Null-Lang2 |Null-Land2 | |
| 104 | +|=== |
| 105 | + |
| 106 | +Running the following query would provide the results shown below. |
| 107 | + |
| 108 | +[source,esql] |
| 109 | +---- |
| 110 | +FROM employees |
| 111 | +| EVAL language_code = emp_no % 10 |
| 112 | +| LOOKUP JOIN languages_lookup_non_unique_key ON language_code |
| 113 | +| WHERE emp_no > 10090 AND emp_no < 10096 |
| 114 | +| SORT emp_no, country |
| 115 | +| KEEP emp_no, language_code, language_name, country; |
| 116 | +---- |
| 117 | + |
| 118 | +[cols=",,,",options="header",] |
| 119 | +|=== |
| 120 | +|emp++_++no |language++_++code |language++_++name |country |
| 121 | +|10091 |1 |English |Canada |
| 122 | +|10091 |1 |null |United Kingdom |
| 123 | +|10091 |1 |English |United States of America |
| 124 | +|10091 |1 |English |null |
| 125 | +|10092 |2 |German |++[++Germany, Austria++]++ |
| 126 | +|10092 |2 |German |Switzerland |
| 127 | +|10092 |2 |German |null |
| 128 | +|10093 |3 |null |null |
| 129 | +|10094 |4 |Spanish |null |
| 130 | +|10095 |5 |null |France |
| 131 | +|=== |
| 132 | + |
| 133 | +[IMPORTANT] |
| 134 | +==== |
| 135 | +`LOOKUP JOIN` does not guarantee the output to be in |
| 136 | +any particular order. If a certain order is required, users should use a |
| 137 | +link:/reference/query-languages/esql/esql-commands.md#esql-sort[`SORT`] |
| 138 | +somewhere after the `LOOKUP JOIN`. |
| 139 | +==== |
| 140 | + |
| 141 | +[discrete] |
| 142 | +[[esql-lookup-join-prereqs]] |
| 143 | +==== Prerequisites |
| 144 | + |
| 145 | +To use `LOOKUP JOIN`, the following requirements must be met: |
| 146 | + |
| 147 | +* *Compatible data types*: The join key and join field in the lookup |
| 148 | +index must have compatible data types. This means: |
| 149 | +** The data types must either be identical or be internally represented |
| 150 | +as the same type in Elasticsearch's type system |
| 151 | +** Numeric types follow these compatibility rules: |
| 152 | +*** `short` and `byte` are compatible with `integer` (all represented as |
| 153 | +`int`) |
| 154 | +*** `float`, `half_float`, and `scaled_float` are compatible |
| 155 | +with `double` (all represented as `double`) |
| 156 | +** For text fields: You can use text fields on the left-hand side of the |
| 157 | +join only if they have a `.keyword` subfield |
| 158 | + |
| 159 | +For a complete list of supported data types and their internal |
| 160 | +representations, see the |
| 161 | +link:/reference/query-languages/esql/limitations.md#_supported_types[Supported |
| 162 | +Field Types documentation]. |
| 163 | + |
| 164 | +[discrete] |
| 165 | +[[esql-lookup-join-limitations]] |
| 166 | +==== Limitations |
| 167 | + |
| 168 | +The following are the current limitations with `LOOKUP JOIN` |
| 169 | + |
| 170 | +* `LOOKUP JOIN` will be successful if the join field in the lookup index |
| 171 | +is a `KEYWORD` type. If the main index's join field is `TEXT` type, it |
| 172 | +must have an exact `.keyword` subfield that can be matched with the |
| 173 | +lookup index's `KEYWORD` field. |
| 174 | +* Indices in |
| 175 | +link:/reference/elasticsearch/index-settings/index-modules.md#index-mode-setting[lookup] |
| 176 | +mode are always single-sharded. |
| 177 | +* Cross cluster search is unsupported. Both source and lookup indices |
| 178 | +must be local. |
| 179 | +* `LOOKUP JOIN` can only use a single match field and a single index. |
| 180 | +Wildcards, aliases, datemath, and datastreams are not supported. |
| 181 | +* The name of the match field in |
| 182 | +`LOOKUP JOIN lu++_++idx ON match++_++field` must match an existing field |
| 183 | +in the query. This may require renames or evals to achieve. |
| 184 | +* The query will circuit break if there are too many matching documents |
| 185 | +in the lookup index, or if the documents are too large. More precisely, |
| 186 | +`LOOKUP JOIN` works in batches of, normally, about 10,000 rows; a large |
| 187 | +amount of heap space is needed if the matching documents from the lookup |
| 188 | +index for a batch are multiple megabytes or larger. This is roughly the |
| 189 | +same as for `ENRICH`. |
0 commit comments