Skip to content

Conversation

nik9000
Copy link
Member

@nik9000 nik9000 commented Jul 3, 2025

This refactors our ValuesSourceReaderOperator so it can split pages
when it reads large values. It does not actually split the pages as
that's a bit tricky. But it sets the stage for the next PR that will
do so.

  • Move ValuesSourceReaderOperator to it's own package
  • Move many inner classes into their own top level classes
  • Extend from AbstractPageMappingToIteratorOperator instead of
    AbstractPageMappingToOperator
    • This allows returning more than one Page per input Page
    • In this PR we still always return one Page per input Page
    • Make new ReleasableIterator subclasses to satisfy
      AbstractPageMappingToIteratorOperator
    • Change status of loading fields from pages_processed to
      pages_received and pages_emitted
  • Fix a bug in AbstractPageMappingToOperator which can leak circuit
    breaker allocation if we fail to during receive. This isn't possible
    in the existing implementations but is possible
    in ValuesSourceReaderOperator.
  • Add a test with large text fields. Right now it still comes back in
    one page because we don't cut the pages.

Closes #130727

@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Jul 3, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@nik9000
Copy link
Member Author

nik9000 commented Jul 3, 2025

This includes a transport change so the new status so it's going to be "fun" to backport, but I'd still like to get this fixed in 8.19.x. So I will backport it. But not automatically.


@After
private void closeIndex() throws IOException {
public void closeScoringIndex() throws IOException {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The private method was making the ide sad. But you can't give them the same names as the super type one.

Copy link
Member

@dnhatn dnhatn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Let's merge this so we can work on the fix. Thanks, Nik!

Copy link
Member

@rjernst rjernst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

transport constant changes look good

@nik9000
Copy link
Member Author

nik9000 commented Jul 7, 2025

transport constant changes look good

thanks!

@nik9000 nik9000 enabled auto-merge (squash) July 7, 2025 20:32
@nik9000 nik9000 merged commit 75fe33d into elastic:main Jul 8, 2025
33 checks passed
nik9000 added a commit to nik9000/elasticsearch that referenced this pull request Jul 8, 2025
This refactors our `ValuesSourceReaderOperator` so it can split pages
when it reads large values. It does not *actually* split the pages as
that's a bit tricky. But it sets the stage for the next PR that will
do so.

* Move `ValuesSourceReaderOperator` to it's own package
* Move many inner classes into their own top level classes
* Extend from `AbstractPageMappingToIteratorOperator` instead of
  `AbstractPageMappingToOperator`
  * This allows returning more than one `Page` per input `Page`
  * In this PR we still always return one `Page` per input `Page`
  * Make new `ReleasableIterator` subclasses to satisfy
    `AbstractPageMappingToIteratorOperator`
  * Change `status` of loading fields from `pages_processed` to
    `pages_received` and `pages_emitted`
* Fix a bug in `AbstractPageMappingToOperator` which can leak circuit
  breaker allocation if we fail to during `receive`. This isn't possible
  in the existing implementations but is possible
  in `ValuesSourceReaderOperator`.
* Add a test with large text fields. Right now it still comes back in
one page because we don't cut the pages.

Closes elastic#130727

Also includes "Claim backported profile versions (elastic#130187)"
@nik9000
Copy link
Member Author

nik9000 commented Jul 8, 2025

9.1: #130781
8.19: #130784

nik9000 added a commit to nik9000/elasticsearch that referenced this pull request Jul 8, 2025
This refactors our `ValuesSourceReaderOperator` so it can split pages
when it reads large values. It does not *actually* split the pages as
that's a bit tricky. But it sets the stage for the next PR that will
do so.

* Move `ValuesSourceReaderOperator` to it's own package
* Move many inner classes into their own top level classes
* Extend from `AbstractPageMappingToIteratorOperator` instead of
  `AbstractPageMappingToOperator`
  * This allows returning more than one `Page` per input `Page`
  * In this PR we still always return one `Page` per input `Page`
  * Make new `ReleasableIterator` subclasses to satisfy
    `AbstractPageMappingToIteratorOperator`
  * Change `status` of loading fields from `pages_processed` to
    `pages_received` and `pages_emitted`
* Fix a bug in `AbstractPageMappingToOperator` which can leak circuit
  breaker allocation if we fail to during `receive`. This isn't possible
  in the existing implementations but is possible
  in `ValuesSourceReaderOperator`.
* Add a test with large text fields. Right now it still comes back in
one page because we don't cut the pages.

Closes elastic#130727
nik9000 added a commit that referenced this pull request Jul 8, 2025
This refactors our `ValuesSourceReaderOperator` so it can split pages
when it reads large values. It does not *actually* split the pages as
that's a bit tricky. But it sets the stage for the next PR that will
do so.

* Move `ValuesSourceReaderOperator` to it's own package
* Move many inner classes into their own top level classes
* Extend from `AbstractPageMappingToIteratorOperator` instead of
  `AbstractPageMappingToOperator`
  * This allows returning more than one `Page` per input `Page`
  * In this PR we still always return one `Page` per input `Page`
  * Make new `ReleasableIterator` subclasses to satisfy
    `AbstractPageMappingToIteratorOperator`
  * Change `status` of loading fields from `pages_processed` to
    `pages_received` and `pages_emitted`
* Fix a bug in `AbstractPageMappingToOperator` which can leak circuit
  breaker allocation if we fail to during `receive`. This isn't possible
  in the existing implementations but is possible
  in `ValuesSourceReaderOperator`.
* Add a test with large text fields. Right now it still comes back in
one page because we don't cut the pages.

Closes #130727

Also includes "Claim backported profile versions (#130187)"
julian-elastic pushed a commit to julian-elastic/elasticsearch that referenced this pull request Jul 8, 2025
This refactors our `ValuesSourceReaderOperator` so it can split pages
when it reads large values. It does not *actually* split the pages as
that's a bit tricky. But it sets the stage for the next PR that will
do so.

* Move `ValuesSourceReaderOperator` to it's own package
* Move many inner classes into their own top level classes
* Extend from `AbstractPageMappingToIteratorOperator` instead of
  `AbstractPageMappingToOperator`
  * This allows returning more than one `Page` per input `Page`
  * In this PR we still always return one `Page` per input `Page`
  * Make new `ReleasableIterator` subclasses to satisfy
    `AbstractPageMappingToIteratorOperator`
  * Change `status` of loading fields from `pages_processed` to
    `pages_received` and `pages_emitted`
* Fix a bug in `AbstractPageMappingToOperator` which can leak circuit
  breaker allocation if we fail to during `receive`. This isn't possible
  in the existing implementations but is possible
  in `ValuesSourceReaderOperator`.
* Add a test with large text fields. Right now it still comes back in
one page because we don't cut the pages.

Closes elastic#130727
nik9000 added a commit that referenced this pull request Jul 8, 2025
This refactors our `ValuesSourceReaderOperator` so it can split pages
when it reads large values. It does not *actually* split the pages as
that's a bit tricky. But it sets the stage for the next PR that will
do so.

* Move `ValuesSourceReaderOperator` to it's own package
* Move many inner classes into their own top level classes
* Extend from `AbstractPageMappingToIteratorOperator` instead of
  `AbstractPageMappingToOperator`
  * This allows returning more than one `Page` per input `Page`
  * In this PR we still always return one `Page` per input `Page`
  * Make new `ReleasableIterator` subclasses to satisfy
    `AbstractPageMappingToIteratorOperator`
  * Change `status` of loading fields from `pages_processed` to
    `pages_received` and `pages_emitted`
* Fix a bug in `AbstractPageMappingToOperator` which can leak circuit
  breaker allocation if we fail to during `receive`. This isn't possible
  in the existing implementations but is possible
  in `ValuesSourceReaderOperator`.
* Add a test with large text fields. Right now it still comes back in
one page because we don't cut the pages.

Closes #130727
dnhatn added a commit that referenced this pull request Jul 10, 2025
We need to release the blocks of the page in 
AbstractPageMappingToIteratorOperator immediately in single-iteration
cases, instead of delaying to the next iteration. This is because the
blocks of the page are now shared with the output page. The output page
can be passed to a separate driver, which may run concurrently with this
driver, leading to data races in AbstractNonThreadSafeRefCounted, which
is not thread-safe.

An alternative would be to make RefCounted for Vectors/Blocks 
thread-safe when they are about to be shared with other drivers via
#allowPassingToDifferentDriver.

Relates #130573

Closes #130959
Closes #130958
Closes #130950
Closes #130925
Closes #130881
Closes #130796
dnhatn added a commit to dnhatn/elasticsearch that referenced this pull request Jul 10, 2025
)

We need to release the blocks of the page in
AbstractPageMappingToIteratorOperator immediately in single-iteration
cases, instead of delaying to the next iteration. This is because the
blocks of the page are now shared with the output page. The output page
can be passed to a separate driver, which may run concurrently with this
driver, leading to data races in AbstractNonThreadSafeRefCounted, which
is not thread-safe.

An alternative would be to make RefCounted for Vectors/Blocks
thread-safe when they are about to be shared with other drivers via

Relates elastic#130573

Closes elastic#130959
Closes elastic#130958
Closes elastic#130950
Closes elastic#130925
Closes elastic#130881
Closes elastic#130796
dnhatn added a commit to dnhatn/elasticsearch that referenced this pull request Jul 10, 2025
)

We need to release the blocks of the page in
AbstractPageMappingToIteratorOperator immediately in single-iteration
cases, instead of delaying to the next iteration. This is because the
blocks of the page are now shared with the output page. The output page
can be passed to a separate driver, which may run concurrently with this
driver, leading to data races in AbstractNonThreadSafeRefCounted, which
is not thread-safe.

An alternative would be to make RefCounted for Vectors/Blocks
thread-safe when they are about to be shared with other drivers via

Relates elastic#130573

Closes elastic#130959
Closes elastic#130958
Closes elastic#130950
Closes elastic#130925
Closes elastic#130881
Closes elastic#130796
elasticsearchmachine pushed a commit that referenced this pull request Jul 10, 2025
…130972)

We need to release the blocks of the page in
AbstractPageMappingToIteratorOperator immediately in single-iteration
cases, instead of delaying to the next iteration. This is because the
blocks of the page are now shared with the output page. The output page
can be passed to a separate driver, which may run concurrently with this
driver, leading to data races in AbstractNonThreadSafeRefCounted, which
is not thread-safe.

An alternative would be to make RefCounted for Vectors/Blocks
thread-safe when they are about to be shared with other drivers via

Relates #130573

Closes #130959
Closes #130958
Closes #130950
Closes #130925
Closes #130881
Closes #130796
elasticsearchmachine pushed a commit that referenced this pull request Jul 10, 2025
…130971)

We need to release the blocks of the page in
AbstractPageMappingToIteratorOperator immediately in single-iteration
cases, instead of delaying to the next iteration. This is because the
blocks of the page are now shared with the output page. The output page
can be passed to a separate driver, which may run concurrently with this
driver, leading to data races in AbstractNonThreadSafeRefCounted, which
is not thread-safe.

An alternative would be to make RefCounted for Vectors/Blocks
thread-safe when they are about to be shared with other drivers via

Relates #130573

Closes #130959
Closes #130958
Closes #130950
Closes #130925
Closes #130881
Closes #130796
mridula-s109 pushed a commit to mridula-s109/elasticsearch that referenced this pull request Jul 17, 2025
)

We need to release the blocks of the page in 
AbstractPageMappingToIteratorOperator immediately in single-iteration
cases, instead of delaying to the next iteration. This is because the
blocks of the page are now shared with the output page. The output page
can be passed to a separate driver, which may run concurrently with this
driver, leading to data races in AbstractNonThreadSafeRefCounted, which
is not thread-safe.

An alternative would be to make RefCounted for Vectors/Blocks 
thread-safe when they are about to be shared with other drivers via
#allowPassingToDifferentDriver.

Relates elastic#130573

Closes elastic#130959
Closes elastic#130958
Closes elastic#130950
Closes elastic#130925
Closes elastic#130881
Closes elastic#130796
mridula-s109 pushed a commit to mridula-s109/elasticsearch that referenced this pull request Jul 17, 2025
)

We need to release the blocks of the page in 
AbstractPageMappingToIteratorOperator immediately in single-iteration
cases, instead of delaying to the next iteration. This is because the
blocks of the page are now shared with the output page. The output page
can be passed to a separate driver, which may run concurrently with this
driver, leading to data races in AbstractNonThreadSafeRefCounted, which
is not thread-safe.

An alternative would be to make RefCounted for Vectors/Blocks 
thread-safe when they are about to be shared with other drivers via
#allowPassingToDifferentDriver.

Relates elastic#130573

Closes elastic#130959
Closes elastic#130958
Closes elastic#130950
Closes elastic#130925
Closes elastic#130881
Closes elastic#130796
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/ES|QL AKA ESQL >non-issue Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v8.19.0 v9.1.0 v9.2.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ESQL: Don't build empty ValuesSourceReaderOperators

4 participants