ES|QL: FORK memory management

I got [a failure](https://github.com/elastic/elasticsearch/issues/130067) from Generative tests that is a bit suspicious:

On CSV dataset (load with `./gradlew :x-pack:plugin:esql:qa:testFixtures:loadCsvSpecData --args="http://elastic-admin:elastic-password@localhost:9200"`):


```
from airp*
| rename scalerank as language_code 
| lookup join languages_lookup on language_code 
| stats  `name` = count_distinct(language_code), dJAukFBDtW = max(language_code), `location` = min(language_code), EtWxTktW = min(language_code) by abbrev 
| mv_expand name 
| mv_expand `location` 
| FORK ( WHERE true ) ( WHERE true ) ( WHERE true ) ( WHERE true ) ( WHERE true ) 
| WHERE _fork == "fork2" 
| DROP _fork 
| limit 8169
```

```
"type": "circuit_breaking_exception",
"reason": "[request] Data too large, data for [<reused_arrays>] would be [322134656/307.2mb], which is larger than the limit of [322122547/307.1mb];
```

What is strange is that the query, without FORK, returns only 889 records and 5 columns with very small values

```
from airp*
| rename scalerank as language_code 
| lookup join languages_lookup on language_code 
| stats  `name` = count_distinct(language_code), dJAukFBDtW = max(language_code), `location` = min(language_code), EtWxTktW = min(language_code) by abbrev 
| mv_expand name 
| mv_expand `location` 
| STATS count(*)
```

```
   count(*)    
---------------
889   
```

```
from airp*
| rename scalerank as language_code 
| lookup join languages_lookup on language_code 
| stats  `name` = count_distinct(language_code), dJAukFBDtW = max(language_code), `location` = min(language_code), EtWxTktW = min(language_code) by abbrev 
| mv_expand name 
| mv_expand `location` 
```

```
     name      |  dJAukFBDtW   |   location    |   EtWxTktW    |    abbrev     
---------------+---------------+---------------+---------------+---------------
1              |8              |8              |8              |null           
1              |9              |9              |9              |AWZ            
1              |9              |9              |9              |GWL            
1              |9              |9              |9              |HOD            
1              |9              |9              |9              |IXR            
1              |9              |9              |9              |LUH           
...
```

The failure is not completely deterministic, but it's very frequent, and it happens also with a smaller query, with only three FORK branches

```
from airp*
| rename scalerank as language_code 
| lookup join languages_lookup on language_code 
| stats  `name` = count_distinct(language_code), dJAukFBDtW = max(language_code), `location` = min(language_code), EtWxTktW = min(language_code) by abbrev 
| FORK ( WHERE true ) ( WHERE true ) ( WHERE true )
```

300MB seems too much for such small query, the source indices contain only 5189 records and 11 columns.

I'm labeling it as a bug for now.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ES|QL: FORK memory management #130072

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

ES|QL: FORK memory management #130072

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions