Skip to content

Conversation

@andreidan
Copy link
Contributor

@andreidan andreidan commented Apr 16, 2025

writerWithOffset uses a lambda to create a RangeMissingHandler however, the RangeMissingHandler interface has a default implementation for sharedInputStreamFactory.

This makes writerWithOffset delegate to the received writer only for the fillCacheRange method where the writer itself perhaps didn't have the sharedInputStream method invoked (always invoking sharedInputStreamFactory before fillCacheRange is part of the contract of the RangeMissingHandler interface)

This PR makes writerWithOffset delegate the sharedInputStreamFactory to the underlying writer.

writerWithOffset uses a lambda to create a RangeMissingHandler however,
the RangeMissingHandler interface has a default implementation for `sharedInputStreamFactory`.

This makes `writerWithOffset` delegate to the received writer only for the `fillCacheRange`
method where the writer itself perhaps didn't have the `sharedInputStream` method invoked
(always invoking `sharedInputStream` before `fillCacheRange` is part of the contract of the
RangeMissingHandler interface)

This PR makes `writerWithOffset` delegate the `sharedInputStream` to the underlying writer.
@andreidan andreidan added >non-issue :Search Foundations/Search Catch all for Search Foundations v9.1.0 labels Apr 16, 2025
@elasticsearchmachine elasticsearchmachine added the Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch label Apr 16, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-foundations (Team:Search Foundations)

Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a concern.


@Override
public SourceInputStreamFactory sharedInputStreamFactory(List<SparseFileTracker.Gap> gaps) {
return writer.sharedInputStreamFactory(gaps);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this works - since it does not apply the offset in that case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, would the relativePos - writeOffset parameter passed in fillCacheRange not do the trick here?
My understanding on sharedInputStreamFactory is that it has to do with how much we can parallelise the calls to fillCacheRange but the place we read from is the relativePos passed in (which will take the offset into account as it'll be relativePos - writeOffset?

We could return null after calling writer.sharedInputStreamFactory(gaps) here if you think it's better? i.e.:

writer.sharedInputStreamFactory(gaps);
return null

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we want to return the shard input stream factory there.

Is there a test demonstrating that it works?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My link above still only overrides the fill cache method. I'll add one more test that calls maybeFetchRange with a real SequentialRangeMissingHandler instance. Thanks Henning!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@henningandersen adding a unit test with SequentialRangeMissingHandler proven to be quite the mission (we'd end up mostly mocking things) so I added an integration test instead in https://github.com/elastic/elasticsearch-serverless/pull/3853

@elasticsearchmachine elasticsearchmachine added the serverless-linked Added by automation, don't add manually label Apr 30, 2025
@andreidan andreidan removed the serverless-linked Added by automation, don't add manually label Apr 30, 2025
@elasticsearchmachine elasticsearchmachine added the serverless-linked Added by automation, don't add manually label May 1, 2025
Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@andreidan
Copy link
Contributor Author

@elasticmachine update branch

@andreidan
Copy link
Contributor Author

@elasticmachine update branch

@andreidan
Copy link
Contributor Author

@elasticmachine update branch

@andreidan
Copy link
Contributor Author

@elasticmachine update branch

@andreidan andreidan merged commit 2375e89 into elastic:main May 29, 2025
18 checks passed
joshua-adams-1 pushed a commit to joshua-adams-1/elasticsearch that referenced this pull request Jun 3, 2025
…126937)

writerWithOffset uses a lambda to create a RangeMissingHandler however,
the RangeMissingHandler interface has a default implementation for `sharedInputStreamFactory`.

This makes `writerWithOffset` delegate to the received writer only for the `fillCacheRange`
method where the writer itself perhaps didn't have the `sharedInputStream` method invoked
(always invoking `sharedInputStream` before `fillCacheRange` is part of the contract of the
RangeMissingHandler interface)

This PR makes `writerWithOffset` delegate the `sharedInputStream` to the underlying writer.
Samiul-TheSoccerFan pushed a commit to Samiul-TheSoccerFan/elasticsearch that referenced this pull request Jun 5, 2025
…126937)

writerWithOffset uses a lambda to create a RangeMissingHandler however,
the RangeMissingHandler interface has a default implementation for `sharedInputStreamFactory`.

This makes `writerWithOffset` delegate to the received writer only for the `fillCacheRange`
method where the writer itself perhaps didn't have the `sharedInputStream` method invoked
(always invoking `sharedInputStream` before `fillCacheRange` is part of the contract of the
RangeMissingHandler interface)

This PR makes `writerWithOffset` delegate the `sharedInputStream` to the underlying writer.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>non-issue :Search Foundations/Search Catch all for Search Foundations serverless-linked Added by automation, don't add manually Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch v9.1.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants