Integration tests for LOOKUP JOIN over wider range of data types #126150

craigtaverner · 2025-04-02T16:01:51Z

This test suite tests the lookup join functionality in ESQL with various data types.

For each pair of types being tested, it builds a main index called "index" containing a single document with as many fields as types being tested on the left of the pair, and then creates that many other lookup indexes, each with a single document containing exactly two fields: the field to join on, and a field to return.

The assertion is that for valid combinations, the return result should exist, and for invalid combinations an exception should be thrown. If no exception is thrown, and no result is returned, our validation rules are not aligned with the internal behaviour (ie. a bug).

Since the LOOKUP JOIN command requires the match field name to be the same between the main index and the lookup index, we will have field names that correctly represent the type of the field in the main index, but not the type of the field in the lookup index. This can be confusing, but it is important to remember that the field names are not the same as the types.

For example, if we are testing the pairs (double, double), (double, float), (float, double) and (float, float), we will create the following indexes:

index_double_double

Index containing a single document with a field of type 'double' like:

        {
            "field_double": 1.0,  // this is mapped as type 'double'
            "other": "value"
        }

index_double_float

Index containing a single document with a field of type 'float' like:

        {
            "field_double": 1.0,  // this is mapped as type 'float' (a float with the name of the main index field) 
            "other": "value"
        }

index_float_double

Index containing a single document with a field of type 'double' like:

        {
            "field_float": 1.0,  // this is mapped as type 'double' (a double with the name of the main index field)
            "other": "value"
        }

index_float_float

Index containing single document with a field of type 'float' like:

        {
            "field_float": 1.0,  // this is mapped as type 'float'
            "other": "value"
        }

index

Index containing document like:

        {
            "field_double": 1.0,  // this is mapped as type 'double'
            "field_float": 1.0    // this is mapped as type 'float'
        }

Note that the lookup indexes have fields with a name that matches the type in the main index, and not the type actually used in the lookup index. Instead, the mapped type should be the type of the right-hand side of the pair being tested.
Then we can run queries like:

    FROM index | LOOKUP JOIN index_double_float ON field_double | KEEP other

And assert that the result exists and is equal to "value".

Checklist:

One thing to consider changing here is allowing float and integer types to be used together. Right now the only thing blocking this is the validation code. The join actually succeeds if we remove the validation.

elasticsearchmachine · 2025-04-02T16:02:15Z

Pinging @elastic/es-analytical-engine (Team:Analytics)

…icsearch into lookup_join_types_it

alex-spies

Nice! This is already looking great!

Before we merge, I wonder if we could simplify the test setup and make the test class a tad easier to follow. Although this is already a great value-add and is safe to merge.

I agree that as a next step, we should expand the number of different types covered.

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plan/logical/join/Join.java

...esql/src/internalClusterTest/java/org/elasticsearch/xpack/esql/action/LookupJoinTypesIT.java

alex-spies · 2025-04-03T13:11:36Z

...esql/src/internalClusterTest/java/org/elasticsearch/xpack/esql/action/LookupJoinTypesIT.java

+ *     <dt>index_double_float</dt>
+ *     <dd>Index containing a single document with a field of type 'float' like: <pre>
+ *         {
+ *             "field_double": 1.0,  // this is mapped as type 'float' (a float with the name of the main index field)


I think we may cut down on the number of created indices (maybe speeding up the test a little) and make this simpler to debug if we always name the fields by the type. We can just perform a RENAME in the query. Prefixing the field names with main_field or lookup_field would also guarantee that we don't accidentally shadow while renaming, and it will make the javadoc more self explanatory.

In fact, the test index creation could be simplified to 2 indices: one main index with every type we currently consider, and one lookup index with every type we currently consider. This means index creation can be done entirely during the setup of the test suite, rather than in each test.

This concern may be more important later, though, when we generalize the join condition so that multiple fields (a composite key) can be used. Then we probably also want to expand this test to use different combinations of types.

I love the idea, but think we can re-visit this later.

alex-spies · 2025-04-03T13:13:20Z

...esql/src/internalClusterTest/java/org/elasticsearch/xpack/esql/action/LookupJoinTypesIT.java

+ * For example, if we are testing the pairs (double, double), (double, float), (float, double) and (float, float),
+ * we will create the following indexes:
+ * <dl>
+ *     <dt>index_double_double</dt>


nit: it's unclear whether this is the main or the lookup index when reading this javadoc.

alex-spies · 2025-04-03T13:23:40Z

...esql/src/internalClusterTest/java/org/elasticsearch/xpack/esql/action/LookupJoinTypesIT.java

+        {
+            TestConfigs configs = testConfigurations.computeIfAbsent("strings", TestConfigs::new);
+            configs.addPasses(KEYWORD, KEYWORD);
+            configs.addPasses(TEXT, KEYWORD);


Is it possible to have a TEXT field without a .keyword subfield? I think we fail in this situation. Could you maybe double check and add a test if that's possible?

We always pass with TEXT on the left because we will read from source for the main index if no KEYWORD subfield exists. Likewise we always fail with TEXT on the right, even if a KEYWORD subfield exists because we have not coded a special case for that (yet).

...esql/src/internalClusterTest/java/org/elasticsearch/xpack/esql/action/LookupJoinTypesIT.java

alex-spies · 2025-04-03T13:48:28Z

...esql/src/internalClusterTest/java/org/elasticsearch/xpack/esql/action/LookupJoinTypesIT.java

+        private void addEmptyResult(DataType mainType, DataType lookupType) {
+            add(new TestConfigPasses(mainType, lookupType, false));
+        }


This appears unused?

Similarly, the boolean arg in the TestConfigPasses could always be true, then, no?

Indeed, I was using this to assert on known failing cases (lookup does not match, but no errors are thrown), but I have since taken a different direction, adding these cases to the type validation so we get an error instead of an empty result. I'll delete this method.

alex-spies · 2025-04-03T13:51:42Z

...esql/src/internalClusterTest/java/org/elasticsearch/xpack/esql/action/LookupJoinTypesIT.java

+        private <E extends Exception> void addFails(DataType mainType, DataType lookupType, Class<E> exception, Consumer<E> assertion) {
+            add(new TestConfigFails<>(mainType, lookupType, exception, assertion));
+        }


I think this was meant to be used in the other addFails methods above? Currently, I think it is unused.

Indeed, this method adds very close to zero value, so instead of using it, I will delete it.

...esql/src/internalClusterTest/java/org/elasticsearch/xpack/esql/action/LookupJoinTypesIT.java

alex-spies · 2025-04-03T13:52:47Z

...esql/src/internalClusterTest/java/org/elasticsearch/xpack/esql/action/LookupJoinTypesIT.java

+            TestConfigs configs = testConfigurations.computeIfAbsent("mixed-numerical", TestConfigs::new);
+            for (DataType mainType : integerTypes) {
+                for (DataType lookupType : floatTypes) {
+                    // TODO: We should probably allow this, but we need to change the validation code in Join.java


++, the behavior should be exactly as if we evaluated an ==, I think.

Added a followup issue to investigate this and try to support it: #127806

bpintea

LGTM.
The setup is thorough, but I'd maybe echo Alex's note on complexity: setting up indices does take time and while this test itself takes under 10s (w/o gradle setup), it could probably be sped-up with just two indices and renames (which would be skipped on same type in both indices, so the RENAME itself should not introduce a blind corner).
This complexity surfaces a bit also in the need to test the test (validateIndex()).

bpintea · 2025-04-03T20:07:45Z

...esql/src/internalClusterTest/java/org/elasticsearch/xpack/esql/action/LookupJoinTypesIT.java

+            Collection<TestConfigs> existing = testConfigurations.values();
+            TestConfigs configs = testConfigurations.computeIfAbsent("same", TestConfigs::new);
+            for (DataType type : all) {
+                if (existingIndex(existing, type, type)) {


Supernit: test against false and skip the continue for slightly better legibility.
Here and below.

costin · 2025-04-23T17:16:37Z

Let's get this PR merged in since it's been sitting here for a while - and move the nanos and potentially speed-up (if it can't be addressed shortly) in a follow-up.

craigtaverner · 2025-05-07T09:42:45Z

Since this PR adds additional validation errors for unsupported types, instead of silently failing (getting no matches in the join), I've viewing that as a kind-of bug-fix worth backporting to 9.0.

elasticsearchmachine · 2025-05-07T11:16:13Z

💚 Backport successful

Status	Branch	Result
✅	9.0

…stic#126150) This test suite tests the lookup join functionality in ESQL with various data types. For each pair of types being tested, it builds a main index called "index" containing a single document with as many fields as types being tested on the left of the pair, and then creates that many other lookup indexes, each with a single document containing exactly two fields: the field to join on, and a field to return. The assertion is that for valid combinations, the return result should exist, and for invalid combinations an exception should be thrown. If no exception is thrown, and no result is returned, our validation rules are not aligned with the internal behaviour (ie. a bug). Since the `LOOKUP JOIN` command requires the match field name to be the same between the main index and the lookup index, we will have field names that correctly represent the type of the field in the main index, but not the type of the field in the lookup index. This can be confusing, but it is important to remember that the field names are not the same as the types.

craigtaverner · 2025-05-07T11:19:48Z

@bpintea and @alex-spies, you both suggested refining the index setup, and so I have created an issue recommending just that, at #127819

#126150) (#127818) * Integration tests for LOOKUP JOIN over wider range of data types (#126150) This test suite tests the lookup join functionality in ESQL with various data types. For each pair of types being tested, it builds a main index called "index" containing a single document with as many fields as types being tested on the left of the pair, and then creates that many other lookup indexes, each with a single document containing exactly two fields: the field to join on, and a field to return. The assertion is that for valid combinations, the return result should exist, and for invalid combinations an exception should be thrown. If no exception is thrown, and no result is returned, our validation rules are not aligned with the internal behaviour (ie. a bug). Since the `LOOKUP JOIN` command requires the match field name to be the same between the main index and the lookup index, we will have field names that correctly represent the type of the field in the main index, but not the type of the field in the lookup index. This can be confusing, but it is important to remember that the field names are not the same as the types. * SEMANTIC_TEXT Still exists on 9.0 as a zombie type

…stic#126150) This test suite tests the lookup join functionality in ESQL with various data types. For each pair of types being tested, it builds a main index called "index" containing a single document with as many fields as types being tested on the left of the pair, and then creates that many other lookup indexes, each with a single document containing exactly two fields: the field to join on, and a field to return. The assertion is that for valid combinations, the return result should exist, and for invalid combinations an exception should be thrown. If no exception is thrown, and no result is returned, our validation rules are not aligned with the internal behaviour (ie. a bug). Since the `LOOKUP JOIN` command requires the match field name to be the same between the main index and the lookup index, we will have field names that correctly represent the type of the field in the main index, but not the type of the field in the lookup index. This can be confusing, but it is important to remember that the field names are not the same as the types.

fang-xing-esql · 2025-06-02T16:08:26Z

💚 All backports created successfully

Status	Branch	Result
✅	8.19

Questions ?

Please refer to the Backport tool documentation

…es (#126150) (#128776) * Integration tests for LOOKUP JOIN over wider range of data types (#126150) This test suite tests the lookup join functionality in ESQL with various data types. For each pair of types being tested, it builds a main index called "index" containing a single document with as many fields as types being tested on the left of the pair, and then creates that many other lookup indexes, each with a single document containing exactly two fields: the field to join on, and a field to return. The assertion is that for valid combinations, the return result should exist, and for invalid combinations an exception should be thrown. If no exception is thrown, and no result is returned, our validation rules are not aligned with the internal behaviour (ie. a bug). Since the `LOOKUP JOIN` command requires the match field name to be the same between the main index and the lookup index, we will have field names that correctly represent the type of the field in the main index, but not the type of the field in the lookup index. This can be confusing, but it is important to remember that the field names are not the same as the types. (cherry picked from commit afc53a3) * fix compilng error * add missing part of the backport of #126456 * add missing part of the backport of #126456 --------- Co-authored-by: Craig Taverner <[email protected]>

…stic#126150) This test suite tests the lookup join functionality in ESQL with various data types. For each pair of types being tested, it builds a main index called "index" containing a single document with as many fields as types being tested on the left of the pair, and then creates that many other lookup indexes, each with a single document containing exactly two fields: the field to join on, and a field to return. The assertion is that for valid combinations, the return result should exist, and for invalid combinations an exception should be thrown. If no exception is thrown, and no result is returned, our validation rules are not aligned with the internal behaviour (ie. a bug). Since the `LOOKUP JOIN` command requires the match field name to be the same between the main index and the lookup index, we will have field names that correctly represent the type of the field in the main index, but not the type of the field in the lookup index. This can be confusing, but it is important to remember that the field names are not the same as the types.

) * Integration tests for LOOKUP JOIN over wider range of data types (#126150) This test suite tests the lookup join functionality in ESQL with various data types. For each pair of types being tested, it builds a main index called "index" containing a single document with as many fields as types being tested on the left of the pair, and then creates that many other lookup indexes, each with a single document containing exactly two fields: the field to join on, and a field to return. The assertion is that for valid combinations, the return result should exist, and for invalid combinations an exception should be thrown. If no exception is thrown, and no result is returned, our validation rules are not aligned with the internal behaviour (ie. a bug). Since the `LOOKUP JOIN` command requires the match field name to be the same between the main index and the lookup index, we will have field names that correctly represent the type of the field in the main index, but not the type of the field in the lookup index. This can be confusing, but it is important to remember that the field names are not the same as the types. * Just use one lookup-settings file This change simplifies backports from 9.x branches where these changes were done as part of other work. * Support DATE_NANOS in LOOKUP JOIN (#127962) We reported in #127249, there is no support for DATE_NANOS in LOOKUP JOIN, even though DATETIME is supported. This PR attempts to fix that. The way that date-time was supported in LOOKUP JOIN (and ENRICH) was by using the `DateFieldMapper.DateFieldType.rangeQuery` (hidden behind the `termQuery` function) which internally takes our long values, casts them to Object, renders them to a string, parses that string back into an Instant (with a bunch of fancy and unnecessary checks for date-math, etc.), and then converts that instant back into a long for the actual query. Parts of this complex process are precision aware (ie. differentiate between ms and ns dates), but not the whole process. Simply dividing the original longs by 1_000_000 before passing them in actually works, but obviously looses precision. And the only reason it works anyway is that the date parsing code will accept a string containing a simple number and interpret it as either ms since the epoch, or years if the number is short enough. This does not work for nano-second dates, and in fact is far from ideal for LOOKUP JOIN on dates which does not need to re-parse the values at all. This complex loop only makes sense in the Query DSL, where we can get all kinds of interesting sources of range values, but seems quite crazy for LOOKUP JOIN where we will always provide the join key from a LongBlock (the backing store of the DATE_TIME DataType, and the DATE_NANOS too). So what we do here for DateNanos is provide two new methods to `DateFieldType`: * `equalityQuery(Long, ...)` to replace `termQuery(Object, ...)` * `rangeQuery(Long, Long, ...)` to replace `rangeQuery(Object, Object, ...)` This allows us to pass in already parsed `long` values, and entirely skip the conversion to strings and re-parsing logic. The new methods are based on the original methods, but considerably simplified due to the removal of the complex parsing logic. The reason for both `equalityQuery` and `rangeQuery` is that it mimics the pattern used by the old `termQuery` with delegated directly down to `rangeQuery`. In addition to this, we hope to support range matching in `LOOKUP JOIN` in the near future. * Fix compile error after backport * Fix compile error after backport * Update docs/changelog/129138.yaml * SEMANTIC_TEXT was removed in later PRs, so not really testable in 8.18. * Delete docs/changelog/129138.yaml * Removed incorrectly added changelog This is a backport

In this test, we create a hundred indices for different combinations of data types. The number of file descriptors used exceeds the limit of HandleLimitFS; therefore, we avoid using it in this test. Relates #126150 Closes #129344

In this test, we create a hundred indices for different combinations of data types. The number of file descriptors used exceeds the limit of HandleLimitFS; therefore, we avoid using it in this test. Relates elastic#126150 Closes elastic#129344

In this test, we create a hundred indices for different combinations of data types. The number of file descriptors used exceeds the limit of HandleLimitFS; therefore, we avoid using it in this test. Relates #126150 Closes #129344

craigtaverner added 5 commits April 1, 2025 17:26

Integration test for LOOKUP JOIN between various types

9b50f2b

Refactored to be easier to read and extend the tests

b296e34

Support many combinations with errors and empty results

19bb2a5

Reorder with private records at the bottom

a388663

Support much wider range of types and mixed types

048c423

craigtaverner added >test Issues or PRs that are addressing/adding tests Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) :Analytics/ES|QL AKA ESQL labels Apr 2, 2025

craigtaverner requested review from alex-spies and bpintea April 2, 2025 16:01

elasticsearchmachine added the v9.1.0 label Apr 2, 2025

craigtaverner added 6 commits April 2, 2025 18:59

Merge branch 'main' into lookup_join_types_it

183f463

Some fixes the javadocs

bfc6ae3

Remove warnings

3b774a3

Merge remote-tracking branch 'origin/main' into lookup_join_types_it

031f370

Merge branch 'lookup_join_types_it' of github.com:craigtaverner/elast…

056a723

…icsearch into lookup_join_types_it

Added tests for DateTime

fc1b126

alex-spies approved these changes Apr 3, 2025

View reviewed changes

bpintea approved these changes Apr 3, 2025

View reviewed changes

Merge remote-tracking branch 'origin/main' into lookup_join_types_it

b39ebc5

craigtaverner mentioned this pull request Apr 23, 2025

LOOKUP JOIN on date_nanos fields finds zero matches #127249

Closed

craigtaverner added 4 commits May 6, 2025 13:57

Merge remote-tracking branch 'origin/main' into lookup_join_types_it

71a0f59

Add negative tests for all unsupported types

8708f9b

Add support for scaled_float and some code-review changes

8d53897

Merge remote-tracking branch 'origin/main' into lookup_join_types_it

9be9e9f

craigtaverner mentioned this pull request May 7, 2025

LOOKUP JOIN on mixed numerical types #127806

Closed

Fix mistake in scaled_float value and test non-unity scaling factor

820d890

craigtaverner added the auto-backport Automatically create backport pull requests when merged label May 7, 2025

craigtaverner added the v9.0.2 label May 7, 2025

craigtaverner merged commit afc53a3 into elastic:main May 7, 2025
17 checks passed

craigtaverner mentioned this pull request May 7, 2025

[9.0] Integration tests for LOOKUP JOIN over wider range of data types (#126150) #127818

Merged

craigtaverner mentioned this pull request May 7, 2025

Refine LOOKUP JOIN types IT #127819

Open

fang-xing-esql added the v8.19.0 label Jun 2, 2025

fang-xing-esql mentioned this pull request Jun 2, 2025

[8.19] Integration tests for LOOKUP JOIN over wider range of data types (#126150) #128776

Merged

craigtaverner mentioned this pull request Jun 9, 2025

[8.18] backport LOOKUP JOIN type tests and DATE_NANOS (#127962) #129138

Merged

dnhatn mentioned this pull request Jun 13, 2025

Avoid HandleLimitFS in LookupJoinTypesIT #129437

Merged

Uh oh!

Integration tests for LOOKUP JOIN over wider range of data types #126150

Integration tests for LOOKUP JOIN over wider range of data types #126150

Uh oh!

Conversation

craigtaverner commented Apr 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elasticsearchmachine commented Apr 2, 2025

Uh oh!

alex-spies left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

craigtaverner May 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bpintea left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

costin commented Apr 23, 2025

Uh oh!

craigtaverner commented May 7, 2025

Uh oh!

Uh oh!

elasticsearchmachine commented May 7, 2025

💚 Backport successful

Uh oh!

craigtaverner commented May 7, 2025

Uh oh!

fang-xing-esql commented Jun 2, 2025

💚 All backports created successfully

Questions ?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

craigtaverner commented Apr 2, 2025 •

edited

Loading

craigtaverner May 7, 2025 •

edited

Loading

bpintea left a comment •

edited

Loading