Skip to content

[Bug] Assert failure in GIN index due to MaxHeapTuplesPerPageBits modificationΒ #1222

@assam258-5892

Description

@assam258-5892

Apache Cloudberry version

GitHub Issue Report

🏷️ Issue Title

Assert failure in GIN index due to MaxHeapTuplesPerPageBits modification

πŸ“‹ Issue Template

Environment

  • Repository: apache/cloudberry
  • Tag: 2.0.0-incubating-rc2
  • Component: GIN Index
  • Severity: High
  • Priority: P0

Labels

  • bug
  • gin-index
  • append-only-tables
  • assert-failure
  • high-priority

Summary

Assert failure occurs in GIN index operations when using append-only tables due to validation logic inconsistency between MaxHeapTuplesPerPageBits modification and OffsetNumberIsValid macro.

Problem Description

When porting from PostgreSQL to Cloudberry, MaxHeapTuplesPerPageBits was modified from 11 to 16 bits to support append-only table optimization. However, the validation logic in OffsetNumberIsValid was not updated accordingly, causing Assert failures when processing large OffsetNumbers that are valid within the 16-bit range but exceed the heap-based limit.

Root Cause

// MaxHeapTuplesPerPageBits was modified to support 16-bit range
#define MaxHeapTuplesPerPageBits    16  // Maximum 65536

// But OffsetNumberIsValid still uses heap-based MaxOffsetNumber
#define OffsetNumberIsValid(offsetNumber) \
    ((offsetNumber != InvalidOffsetNumber) && \
     (offsetNumber <= MaxOffsetNumber))  // Still ~291 for heap tables

Assert Location

  • File: src/backend/access/gin/ginpostinglist.c
  • Line: 338
  • Code: Assert(OffsetNumberIsValid(ItemPointerGetOffsetNumber(&segment->first)));

Steps to Reproduce

  1. Create an append-only table:

    CREATE TABLE test_ao (id int, data text) WITH (appendonly=true);
  2. Create a GIN index:

    CREATE INDEX idx_test_ao_gin ON test_ao USING gin(to_tsvector('english', data));
  3. Insert large amount of data to generate large OffsetNumbers:

    INSERT INTO test_ao SELECT generate_series(1, 100000), 'test data ' || generate_series(1, 100000);
  4. Query using the GIN index:

    SELECT * FROM test_ao WHERE to_tsvector('english', data) @@ to_tsquery('test');

Expected Behavior

  • GIN index should work properly with append-only tables
  • Large OffsetNumbers (up to 65535) should be accepted as valid
  • No Assert failures should occur

Actual Behavior

  • Assert failure occurs in ginpostinglist.c line 338
  • Error: OffsetNumberIsValid validation fails for valid OffsetNumbers > 291
  • Example: OffsetNumber = 30000 (valid in 16-bit range) fails validation because 30000 > 291

Failure Scenario

// Append-only table generates large OffsetNumber
OffsetNumber large_offset = 30000;  // Valid within 16-bit range

// Validation fails due to heap-based limit
Assert(OffsetNumberIsValid(large_offset));  // 30000 > 291 β†’ Assert failure!

Impact

  • High severity: Causes application crashes
  • Blocks functionality: Prevents using GIN indexes on append-only tables
  • Affects: All append-only tables with GIN indexes containing large OffsetNumbers
  • Data integrity: No data corruption, but functionality is blocked

Proposed Solution

Option 1: Immediate Fix (Recommended)

Modify line 338 in ginpostinglist.c to use 16-bit range validation:

// Original code:
Assert(OffsetNumberIsValid(ItemPointerGetOffsetNumber(&segment->first)));

// Proposed fix:
{
    OffsetNumber offset = ItemPointerGetOffsetNumber(&segment->first);
    Assert(offset != InvalidOffsetNumber);
    Assert(offset <= ((1 << MaxHeapTuplesPerPageBits) - 1));
}

Option 2: Long-term Improvement

Add new macro in src/include/access/gin_private.h:

#define MaxGinOffsetNumber ((1 << MaxHeapTuplesPerPageBits) - 1)
#define GinOffsetNumberIsValid(offsetNumber) \
    ((offsetNumber != InvalidOffsetNumber) && \
     (offsetNumber <= MaxGinOffsetNumber))

// Then modify ginpostinglist.c line 338:
Assert(GinOffsetNumberIsValid(ItemPointerGetOffsetNumber(&segment->first)));

Advantages of Proposed Solution

  • βœ… Minimal invasive: Changes only 1 line for immediate fix
  • βœ… Safe: Maintains existing data compatibility
  • βœ… Preserves optimization: Keeps append-only table 16-bit optimization
  • βœ… Low risk: Minimal code changes reduce introduction of new bugs
  • βœ… Immediate: Can be applied as a hotfix

Alternative Solutions Considered

  1. Revert MaxHeapTuplesPerPageBits to 11: ❌ Causes data corruption and loses append-only optimization
  2. Complete redesign: ❌ High risk and time-consuming
  3. Global OffsetNumberIsValid modification: ❌ May affect other components unexpectedly

Testing Requirements

  1. Functional testing: Verify GIN index works with append-only tables
  2. Regression testing: Ensure heap tables still work correctly
  3. Performance testing: Verify no performance degradation
  4. Edge case testing: Test with maximum OffsetNumber values

Additional Context

This issue is specific to Cloudberry's append-only table optimization. PostgreSQL users are not affected as they use the original 11-bit implementation. The modification was made to support large-scale OLAP workloads and columnar storage optimization in Cloudberry.

Files to be Modified

  • src/backend/access/gin/ginpostinglist.c (line 338) - Primary fix
  • src/include/access/gin_private.h (optional) - Long-term improvement

Risk Assessment

  • Risk Level: Low (minimal code change)
  • Rollback: Easy (single line change)
  • Testing: Straightforward test cases
  • Compatibility: Maintains backward compatibility

What happened

Crash on Assert Point.

Assert Location

  • File: src/backend/access/gin/ginpostinglist.c
  • Line: 338
  • Code: Assert(OffsetNumberIsValid(ItemPointerGetOffsetNumber(&segment->first)));

What you think should happen instead

Problem Description

When porting from PostgreSQL to Cloudberry, MaxHeapTuplesPerPageBits was modified from 11 to 16 bits to support append-only table optimization. However, the validation logic in OffsetNumberIsValid was not updated accordingly, causing Assert failures when processing large OffsetNumbers that are valid within the 16-bit range but exceed the heap-based limit.

How to reproduce

It's difficult to make a case because the data is inside the corporate security net.

Operating System

All

Anything else

No response

Are you willing to submit PR?

  • Yes, I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

Labels

type: BugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions