Skip to content

Conversation

anujmodi2021
Copy link
Contributor

Description of PR

JIRA: https://issues.apache.org/jira/browse/HADOOP-19641
We have observed this across multiple workload runs that when we start reading data from input stream. The first read which came to input stream has to be read synchronously even if we trigger prefetch request for that particular offset. Most of the times we end up doing extra work of checking if the prefetch is trigerred, removing prefetch from the pending queue and go ahead to do a direct remote read in workload thread itself.

To avoid all this overhead, we will always bypass read ahead for the very first read of each input stream and trigger read aheads for second read onwards.

How was this patch tested?

TBA

@anujmodi2021 anujmodi2021 changed the title Hadoop 19641 Hadoop 19641: [ABFS][ReadAheadV2] First Read should bypass ReadBufferManager Jul 29, 2025
@hadoop-yetus

This comment was marked as outdated.

@anujmodi2021 anujmodi2021 marked this pull request as ready for review July 30, 2025 07:02
@anujmodi2021 anujmodi2021 requested a review from Copilot July 31, 2025 06:05
Copilot

This comment was marked as outdated.

@hadoop-yetus

This comment was marked as outdated.

@anujmodi2021 anujmodi2021 requested a review from Copilot August 5, 2025 08:43
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR optimizes the read-ahead behavior in ABFS input streams by bypassing the ReadBufferManager for the first read operation, reducing unnecessary overhead when the read pattern is not yet established.

  • Introduces a bypass mechanism for the first read to avoid synchronous prefetch overhead
  • Modifies the number of read-ahead requests from configurable depth to 1 for the first read
  • Updates existing tests to account for the new first-read bypass behavior

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
AbfsInputStream.java Implements first-read bypass logic by setting numReadAheads to 1 and adds isFirstRead() method
ReadBufferManagerV1.java Prevents removal of prefetch buffers from queue when triggered by first read
TestAbfsInputStream.java Updates test expectations and mocking to accommodate the new first-read behavior

* For the first read of this input stream, we don't read ahead but keep
* the current read data in cache as pattern might not be sequential.
*/
int numReadAheads = firstRead? 1 : this.readAheadQueueDepth;
Copy link
Preview

Copilot AI Aug 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The ternary operator lacks spaces around the ? operator, which deviates from Java coding conventions. It should be firstRead ? 1 : this.readAheadQueueDepth.

Suggested change
int numReadAheads = firstRead? 1 : this.readAheadQueueDepth;
int numReadAheads = firstRead ? 1 : this.readAheadQueueDepth;

Copilot uses AI. Check for mistakes.

if (-1 == fCursorAfterLastRead || fCursorAfterLastRead == fCursor || b.length >= bufferSize) {
LOG.debug("Sequential read with read ahead size of {}", bufferSize);
// Sequential read pattern detected. Enable read ahead.
LOG.debug("Sequential read with read size of {} and read ahead enabled", bufferSize);
Copy link
Preview

Copilot AI Aug 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The log message mentions 'read size of {}' but logs bufferSize instead of the actual read size (b.length). This could be misleading when debugging read operations.

Suggested change
LOG.debug("Sequential read with read size of {} and read ahead enabled", bufferSize);
LOG.debug("Sequential read with read size of {} and read ahead enabled", b.length);

Copilot uses AI. Check for mistakes.

@anujmodi2021 anujmodi2021 marked this pull request as draft August 7, 2025 09:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants