Skip to content

Conversation

@lxy-9602
Copy link
Collaborator

@lxy-9602 lxy-9602 commented Jan 5, 2026

Purpose

Problem Description

In the MergeFileSplitRead logic for PK MOR tables:

  1. When the sequence.field is included in the primary key fields, the current implementation first splits the primary key (PK) and non-PK fields based on user read schema.
  2. After splitting, the sequence.field is added to the non-PK fields. However, this results in an incorrect arrow::Schema that contains duplicate fields (e.g., sequence.field appears in both PK and non-PK fields).
  3. The arrow::Schema::GetFieldIndex() function does not handle duplicate fields gracefully and returns -1 when duplicates are detected. This leads to a coredump during execution.

Solution

To resolve this issue:

  • Modified the order of operations in MergeFileSplitRead:
    • First, add the sequence.field to the appropriate field set.
    • Then, split the PK and non-PK fields.

This ensure no duplicate fields are introduced in the arrow::Schema.

Tests

WriteAndReadInteTest.TestPKWithSequenceFieldInPKField
WriteAndReadInteTest.TestPKWithSequenceFieldPartialInPKField

@lszskye lszskye merged commit a2a6ec0 into alibaba:main Jan 5, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants