Skip to content

Conversation

craigtaverner
Copy link
Contributor

Backports the following commits to 8.x:

Always return `KEYWORD` for functions that previously returned `TEXT`, because any change to the value, no matter how small, is enough to render meaningless the original analyzer associated with the `TEXT` field value. In principle, if the attribute is no longer the original `FieldAttribute`, it can no longer claim to have the type `TEXT`.

This has been done for all functions: conversion functions, aggregating functions, multi-value functions. There were several that already produced `KEYWORD` for `TEXT` input (eg. ToString, FromBase64 and ToBase64, MvZip, ToLower, ToUpper, DateFormat, Concat, Left, Repeat, Replace, Right, Split, Substring), but many others that incorrectly claimed to produce `TEXT`, while this was really a false claim. This PR makes that now strict, and includes changes to the functions' units tests to disallow the tests to expect any functions output to be `TEXT`.

One side effect of this change is that methods that take multiple parameters that require all of them to have the same type, will now treat TEXT and KEYWORD the same. This was already the case for functions like `Concat`, but is now also the case for `Greatest`, `Least`, `Case`, `Coalesce` and `MvAppend`.

An associated change is that the type casting operator `::text` has been entirely removed. It used to map onto the `ToString` function which returned type KEYWORD, and so `::text` really produced a `KEYWORD`, which is a lie, or at least a `bug`, which is now fixed. Should we ever wish to actually produce real `TEXT`, we might love the fact that this operator has been freed up for future use (although it seems likely that function will require parameters to specify the analyzer, so might never be an operator again).

### Backwards compatibility issues:

This is a change that will fail BWC tests, since we have many tests that assert on TEXT output to functions. For this reason we needed to block two scenarios:

* We used the capability `functions_never_emit_text` to prevent 7 csv-spec tests and 2 yaml tests from being run against older versions that still emit text.
* We used `skipTest` to also block those two yaml tests from being run against the latest build, but using older yaml files downloaded (as far back as 8.14).

In all cases the change observed in these tests was simply the results columns no longer having `text` type, and instead being `keyword`.

---------

Co-authored-by: Luigi Dell'Aquila <[email protected]>
@craigtaverner craigtaverner added :Analytics/ES|QL AKA ESQL >bug auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) backport ES|QL-ui Impacts ES|QL UI Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) labels Oct 25, 2024
Copy link
Contributor

Documentation preview:

@elasticsearchmachine elasticsearchmachine merged commit 3b3e7f7 into elastic:8.x Oct 25, 2024
16 checks passed
@craigtaverner craigtaverner deleted the backport/8.x/pr-114334 branch October 25, 2024 09:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/ES|QL AKA ESQL auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) backport >bug ES|QL-ui Impacts ES|QL UI Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v8.17.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants