Skip to content

Conversation

@lucasfang
Copy link
Collaborator

@lucasfang lucasfang commented Jan 26, 2026

Purpose

Support prefetch for ORC , and wiring a read-ahead cache into the existing PrefetchFileBatchReaderImpl. Introduce a shared ReadAheadCache + CacheInputStream to serve prefetched byte ranges to parallel readers.

API and Format

PrefetchFileBatchReader:
virtual Result<std::vector<std::pair<uint64_t, uint64_t>>> PreBufferRange();
ReadContext:
bool EnablePrefetchCache() const and const CacheConfig& GetCacheConfig() const
ReadContextBuilder:
ReadContextBuilder& WithCacheConfig(const CacheConfig& config);
ReadContextBuilder& EnablePrefetchCache(bool enabled);

Copilot AI review requested due to automatic review settings January 26, 2026 03:56
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds ORC support to the prefetch path by introducing ORC-aware read-range generation and byte-range prebuffering, and wiring a read-ahead cache into the existing PrefetchFileBatchReaderImpl.

Changes:

  • Add ORC ReadRangeGenerator and OrcReaderWrapper to support row-range generation and ORC prebuffer range hints.
  • Introduce a shared ReadAheadCache + CacheInputStream to serve prefetched byte ranges to parallel readers.
  • Extend prefetch/caching controls via ReadContext/builder and expand integration/unit tests to cover ORC prefetch scenarios.

Reviewed changes

Copilot reviewed 30 out of 30 changed files in this pull request and generated 17 comments.

Show a summary per file
File Description
test/inte/scan_and_read_inte_test.cpp Adds ORC prefetch=true parameterization for integration coverage.
test/inte/read_inte_with_index_test.cpp Adds ORC prefetch=true parameterization for integration coverage.
test/inte/read_inte_test.cpp Expands ORC test params to include prefetch combinations.
src/paimon/format/orc/read_range_generator_test.cpp New unit tests for ORC row-range generation / thresholds.
src/paimon/format/orc/read_range_generator.h Declares ORC row-range generator API.
src/paimon/format/orc/read_range_generator.cpp Implements ORC row-range generation and prefetch decision logic.
src/paimon/format/orc/orc_reader_wrapper_test.cpp New unit test for wrapper row position tracking.
src/paimon/format/orc/orc_reader_wrapper.h Introduces wrapper to expose GetNextRowToRead and prebuffer hints.
src/paimon/format/orc/orc_reader_wrapper.cpp Implements wrapper seek/schema/next-batch logic.
src/paimon/format/orc/orc_input_stream_impl.h Adds async-read tracking fields.
src/paimon/format/orc/orc_input_stream_impl.cpp Implements ORC async read via underlying InputStream::ReadAsync.
src/paimon/format/orc/orc_format_defs.h Adds ORC prefetch threshold-related constants.
src/paimon/format/orc/orc_file_batch_reader_test.cpp Updates tests for new row reader options signature/column id capture.
src/paimon/format/orc/orc_file_batch_reader.h Switches ORC reader to PrefetchFileBatchReader + prefetch APIs.
src/paimon/format/orc/orc_file_batch_reader.cpp Wires ORC reader wrapper + read-range generation into batch reader.
src/paimon/format/orc/CMakeLists.txt Adds new ORC sources and tests to the build.
src/paimon/core/operation/read_context.cpp Adds prefetch-cache options into read context/builder plumbing.
src/paimon/core/operation/internal_read_context.h Exposes prefetch-cache settings to internal read pipeline.
src/paimon/core/operation/abstract_split_read.cpp Enables prefetch path for ORC and passes cache config into prefetch reader.
src/paimon/core/deletionvectors/apply_deletion_vector_batch_reader_test.cpp Updates prefetch reader construction to include cache params.
src/paimon/common/utils/byte_range_combiner.h Adjusts combiner API/visibility after refactor.
src/paimon/common/utils/byte_range_combiner.cpp Refactors coalescing implementation into CoalesceByteRanges.
src/paimon/common/reader/prefetch_file_batch_reader_impl_test.cpp Updates tests for new prefetch reader ctor and adds ORC format where available.
src/paimon/common/reader/prefetch_file_batch_reader_impl.h Extends Create/ctor to accept cache config + cache instance.
src/paimon/common/reader/prefetch_file_batch_reader_impl.cpp Creates shared read-ahead cache and initializes it using PreBufferRange().
src/paimon/common/fs/cache_input_stream.h New InputStream wrapper serving reads from ReadAheadCache when possible.
src/paimon/common/file_index/bitmap/apply_bitmap_index_batch_reader_test.cpp Updates prefetch reader construction to include cache params.
include/paimon/reader/prefetch_file_batch_reader.h Adds PreBufferRange() API for byte-range prefetch hints.
include/paimon/read_context.h Adds builder APIs and getters for enabling/configuring prefetch cache.
cmake_modules/orc.diff Patches upstream ORC to expose preBufferRange(...) and reuse it in preBuffer.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Collaborator

@lxy-9602 lxy-9602 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@lucasfang lucasfang merged commit 968fa1c into alibaba:main Jan 27, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants