Skip to content

Conversation

craigtaverner
Copy link
Contributor

@craigtaverner craigtaverner commented Sep 3, 2025

FORK creates ReferenceAttributes that represent the combination of the underlying FieldAttributes inside each forked sub-plan. However, union-types uses synthetic FieldAttributes to represent the underlying original FieldAttributes converted to a single type. These synthetic attributes are removed in the last phase of semantic analysis using DROP commands (Project), but the code to do that does not handle the ReferenceAttributes provided by FORK. The simplest approach to fixing this is not restrict the check to FieldAttributes, but only check for attr.synthetic().

It would seem the only special case we need to deal with is the NO_FIELDS special field which is both synthetic and a ReferenceAttribute, and should not be touched by the union-types cleanup code.

Fixes #133973

@craigtaverner craigtaverner added >bug Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) :Analytics/ES|QL AKA ESQL labels Sep 3, 2025
@elasticsearchmachine
Copy link
Collaborator

Hi @craigtaverner, I've created a changelog YAML for you.

@craigtaverner craigtaverner marked this pull request as ready for review September 4, 2025 13:30
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

FROM apps, apps_short
| EVAL x = id::integer
| FORK (WHERE true) (WHERE true)
| DROP _fork
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we do something with x after FORK? for example adding EVAL x = x + 1 or something similar?
just to validate that the value can be used.

Copy link
Contributor

@ioanatia ioanatia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can also remove these lines here:

assumeFalse(
"Tests using implicit_casting_date_and_date_nanos are not supported for now",
testCase.requiredCapabilities.contains(IMPLICIT_CASTING_DATE_AND_DATE_NANOS.capabilityName())
);

I tested your fix and it seems that it also solves the problem we had with the implicit casting for date and date nanos (which I just found out we took out of snapshot 🎉 ).
At the time we released FORK this was still under development and so it wasn't a priority to fix.
With GenerativeForkRestTest we test almost all csv specs, by adding at the end "FORK (WHERE true) (WHERE true) | WHERE _fork == "fork1" | DROP _fork". This new query using FORK should render the same results.

So I'd say that if we want to add more tests using FORK, it's less important to follow the same pattern with FORK (WHERE true) (WHERE true) and it's more important to see if the columns can be used after FORK, if there is any weirdness when the FORK columns have different outputs etc. I am happy to add more tests for it in a separate PR if we simply want to get this merged sooner.

@astefan
Copy link
Contributor

astefan commented Sep 5, 2025

Regarding the tests with fork and their diversity, I agree that we need more. With this project more tests is always better.

@ioanatia did a amazing job by adding GenerativeForkRestTest and the fork testing "in the wild" got a boost definitely.

If we could have a combination of this test with what GenerativeIT is doing that would be even better. @ioanatia is this something on your list for fork by any chance? That would supercharge fork testing coverage.
Maybe piggyback on what GenerativeForkRestTest is doing (adding the fork at the end of every csv-spec test out there) and also pick some functionality from GenerativeIT to generate more commands after what GenerativeForkRestTest already created.

Copy link
Contributor

@astefan astefan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

8268153 | sample_data_ts_long | 2023-10-23T13:52:55.015Z | 1698069175015 | 1698069175015 | 172.21.3.15 | 172.21.3.15
8268153 | sample_data_ts_nanos | 2023-10-23T13:52:55.015Z | 2023-10-23T13:52:55.015123456Z | 1698069175015123456 | 172.21.3.15 | 172.21.3.15
;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might want to look at inlinestats.csv-spec file and search for https://github.com/elastic/elasticsearch/issues/133973. There are some ignored tests in there that need a check.

Copy link
Contributor

@ncordon ncordon Sep 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We checked these tests cannot be unignored because there seems to be a bug with INLINESTATS in queries where we use a union type, which seems unrelated to what we've done for FORK:

FROM employees, employees_incompatible
| KEEP emp_no, hire_date 
| EVAL yr = DATE_TRUNC(1 year, hire_date)
| INLINESTATS c = count(emp_no::long) BY yr
| SORT yr DESC
| LIMIT 5
;

To be investigated as a separate piece of work

@ioanatia
Copy link
Contributor

ioanatia commented Sep 5, 2025

If we could have a combination of this test with what GenerativeIT is doing that would be even better. @ioanatia is this something on your list for fork by any chance? That would supercharge fork testing coverage.

That's a good idea.
I'd still like to keep the GenerativeForkRestTest because that will run all the CSV specs.
But we can improve the GenerativeIT tests- we have a single source command generator just for FROM.
Maybe we can add another generator that picks a random existing CSV test.
We can append something like FORK (WHERE true) (WHERE true).
And then GenerativeIT can continue as it does now, by appending more commands after the initial query.
This wasn't on my list, but it is now 😄 .

@ncordon ncordon merged commit 06da8a4 into elastic:main Sep 5, 2025
33 checks passed
@ioanatia ioanatia mentioned this pull request Sep 3, 2025
14 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/ES|QL AKA ESQL >bug Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v9.2.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ESQL: union types with FORK leaks internal attributes in the output

5 participants