[refactor](be) Enforce COW ownership for assume_mutable#63001
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
|
/review |
There was a problem hiding this comment.
I found a blocking COW correctness issue in complex-type deserialization. The PR changes these paths from mutating the existing child through assume_mutable() to mutating a possibly cloned child, but the cloned child is not written back to the parent complex column. That means deserialization can update offsets while dropping child data when nested children are shared, which is exactly the ownership case this PR is trying to make safe.
Critical checkpoint conclusions:
- Goal/test: The goal is to restore COW-safe mutation after assume_mutable() enforces exclusive ownership. The broad BE unit-test list helps, but the complex-type deserialize path still has an uncovered shared-child case.
- Scope/focus: The change is focused on COW ownership, but several deserialize updates need the same write-back pattern used elsewhere after mutating detached children.
- Concurrency/lifecycle/config/compatibility: No new concurrency, lifecycle, configuration, or storage-format compatibility concern identified in the reviewed changes.
- Parallel paths: Array, Map, and Struct deserialize have the same detached-child pattern and all need to be fixed consistently.
- Tests: Please add or adjust coverage for deserializing complex columns whose child ColumnPtr is shared so this does not regress.
- Observability/persistence/write correctness/performance: No additional observability, persistence, transaction, or performance blocker found beyond the data-correctness issue above.
User focus: No additional user-provided review focus was specified.
|
run buildall |
|
/review |
There was a problem hiding this comment.
I found several issues that should be addressed before merging. The existing inline threads already cover the Array/Map/Struct COW deserialization ownership problems, so I did not duplicate them.
Critical checkpoint conclusions:
- Goal/test: The PR appears to make BE column mutation safe under COW and remove streaming-job filtered-row persistence. It is only partially achieved: the ORC schema-change nullable path can copy stale null flags, and filtered-row observability is left inconsistent. Existing tests were updated broadly, but the streaming filtered-row regression test was deleted without replacement coverage for the new expected behavior.
- Scope/focus: The BE COW change is broad and not fully localized; this increases risk in hot scan/convert paths.
- Concurrency/lifecycle: I did not find new lock-order or thread-lifecycle issues in the reviewed paths.
- Config/incompatibility: No new config items. The protobuf field removal needs reserved tag/name handling to avoid future wire/persistence compatibility hazards.
- Parallel paths: Filtered-row updates were removed from transactional streaming jobs, while non-transactional streaming offset commits and metric definitions still keep filtered rows, leaving behavior inconsistent.
- Tests/results: Missing replacement coverage for the removed streaming filtered-row behavior. I did not run tests in this review runner.
- Observability: Filtered-row metrics/job details can now report zero or omit data even when rows are filtered.
User focus: No additional user-provided review focus was specified.
| auto converted_column = doris_column->assume_mutable(); | ||
| if (converted_column->is_nullable()) { | ||
| align_orc_null_map(resolved_column, | ||
| reinterpret_cast<ColumnNullable*>(converted_column.get()), |
There was a problem hiding this comment.
This copies the source null map from offset 0, but in the non-consistent nullable converter path the source nullable column was built with the destination null map and then deep-mutated. If doris_column already has rows, fill_orc_null_map() appends the newly-read ORC nulls after those old rows in resolved_column; passing only resolved_column->size() here then makes align_orc_null_map() insert the first new_rows entries, which are stale old null flags rather than the newly appended slice. This can corrupt null flags when reading multiple batches through an ORC schema-change conversion. Please preserve the destination old row count and copy from that offset, or avoid cloning a source nullable wrapper that contains the destination's existing null map.
Issue Number: None Related PR: apache#63001 Problem Summary: After the COW mutable ownership changes, constructing a MutableBlock from the cached Iceberg equality-delete block transfers the block columns into the mutable block. The equality-delete loader merged rows into the mutable block but did not write the result back, leaving null columns in the cached block and crashing HybridSet while building delete filters. This also applies the requested BE clang-format updates. None - Test: Manual test - build-support/check-format.sh - ./build.sh --be -j 100 - git diff --check - Behavior changed: No - Does this need documentation: No
|
run buildall |
…sion Issue Number: None Related PR: apache#63001 Problem Summary: After assume_mutable started asserting exclusive ownership, sort aggregate state still appended through immutable Block columns on the hot add/merge path, and nullable schema-change conversion could copy a stale null-map prefix after ORC/Parquet COW cloning. Keep the sort aggregate state as a MutableBlock, remove a read-only mutable assertion, and copy nullable null-map slices from the appended source range. Release note None Check List (For Author) - Test: Unit Test - ./run-be-ut.sh --run --filter=AggregateFunctionSortDataTest.merge_does_not_share_rhs_block:OrcReaderFillDataTest.SchemaChangeNullableNullMapUsesAppendedSlice:ParquetColumnConvertTest.AlignNullMapUsesAppendedSourceSlice -j 100 - ./run-regression-test.sh --run -f regression-test/suites/query_p0/runtimefilterV2/rfv2.groovy - ./build.sh --be -j 100 - build-support/clang-format.sh - build-support/check-format.sh - git diff --check - Local cluster restart and select 1/show backends - build-support/run-clang-tidy.sh --build-dir be/build_ASAN attempted; changed-line findings were fixed, remaining failures are existing ORC/test include/jni-util baseline or tooling diagnostics - Behavior changed: No - Does this need documentation: No
…sion Issue Number: None Related PR: apache#63001 Problem Summary: After assume_mutable started asserting exclusive ownership, sort aggregate state still appended through immutable Block columns on the hot add/merge path, and nullable schema-change conversion could copy a stale null-map prefix after ORC/Parquet COW cloning. Keep the sort aggregate state as a MutableBlock, remove a read-only mutable assertion, and copy nullable null-map slices from the appended source range. Release note None Check List (For Author) - Test: Unit Test - ./run-be-ut.sh --run --filter=AggregateFunctionSortDataTest.merge_does_not_share_rhs_block:OrcReaderFillDataTest.SchemaChangeNullableNullMapUsesAppendedSlice:ParquetColumnConvertTest.AlignNullMapUsesAppendedSourceSlice -j 100 - ./run-regression-test.sh --run -f regression-test/suites/query_p0/runtimefilterV2/rfv2.groovy - ./build.sh --be -j 100 - build-support/clang-format.sh - build-support/check-format.sh - git diff --check - Local cluster restart and select 1/show backends - build-support/run-clang-tidy.sh --build-dir be/build_ASAN attempted; changed-line findings were fixed, remaining failures are existing ORC/test include/jni-util baseline or tooling diagnostics - Behavior changed: No - Does this need documentation: No
|
run buildall |
Issue Number: None Related PR: apache#63001 Problem Summary: After the COW mutable ownership changes, constructing a MutableBlock from the cached Iceberg equality-delete block transfers the block columns into the mutable block. The equality-delete loader merged rows into the mutable block but did not write the result back, leaving null columns in the cached block and crashing HybridSet while building delete filters. This also applies the requested BE clang-format updates. None - Test: Manual test - build-support/check-format.sh - ./build.sh --be -j 100 - git diff --check - Behavior changed: No - Does this need documentation: No
…sion Issue Number: None Related PR: apache#63001 Problem Summary: After assume_mutable started asserting exclusive ownership, sort aggregate state still appended through immutable Block columns on the hot add/merge path, and nullable schema-change conversion could copy a stale null-map prefix after ORC/Parquet COW cloning. Keep the sort aggregate state as a MutableBlock, remove a read-only mutable assertion, and copy nullable null-map slices from the appended source range. Release note None Check List (For Author) - Test: Unit Test - ./run-be-ut.sh --run --filter=AggregateFunctionSortDataTest.merge_does_not_share_rhs_block:OrcReaderFillDataTest.SchemaChangeNullableNullMapUsesAppendedSlice:ParquetColumnConvertTest.AlignNullMapUsesAppendedSourceSlice -j 100 - ./run-regression-test.sh --run -f regression-test/suites/query_p0/runtimefilterV2/rfv2.groovy - ./build.sh --be -j 100 - build-support/clang-format.sh - build-support/check-format.sh - git diff --check - Local cluster restart and select 1/show backends - build-support/run-clang-tidy.sh --build-dir be/build_ASAN attempted; changed-line findings were fixed, remaining failures are existing ORC/test include/jni-util baseline or tooling diagnostics - Behavior changed: No - Does this need documentation: No
### What problem does this PR solve? Issue Number: None Related PR: apache#63001 Problem Summary: Parquet conversion aligned nullable source null maps by source nested row counts. Under the COW assert-mutability contract, cached nullable sources may carry an old destination null-map prefix while their nested columns only contain the current batch, and fixed-length physical columns may store multiple nested values per logical row. This caused ParquetReaderTest.varbinary_string and adjacent varbinary paths to abort. Track destination null-map length separately and copy the appended source null-map slice by logical new rows. ### Release note None ### Check List (For Author) - Test: Unit Test - ./run-be-ut.sh --run --filter=ParquetReaderTest.varbinary_string -j 100 - ./run-be-ut.sh --run --filter=ParquetReaderTest.uuid_varbinary:ParquetReaderTest.varbinary_varbinary:ParquetReaderTest.varbinary_string:ParquetReaderTest.varbinary_string2:ParquetColumnConvertTest.*NullMap* -j 100 - ./run-be-ut.sh --run --filter=ParquetReaderTest.*:ParquetColumnConvertTest.*:OrcReaderFillDataTest.SchemaChangeNullableNullMapUsesAppendedSlice -j 100 - build-support/check-format.sh - Behavior changed: No - Does this need documentation: No
|
run buildall |
|
/review |
Issue Number: None Related PR: apache#63001 Problem Summary: After the COW mutable ownership changes, constructing a MutableBlock from the cached Iceberg equality-delete block transfers the block columns into the mutable block. The equality-delete loader merged rows into the mutable block but did not write the result back, leaving null columns in the cached block and crashing HybridSet while building delete filters. This also applies the requested BE clang-format updates. None - Test: Manual test - build-support/check-format.sh - ./build.sh --be -j 100 - git diff --check - Behavior changed: No - Does this need documentation: No
…sion Issue Number: None Related PR: apache#63001 Problem Summary: After assume_mutable started asserting exclusive ownership, sort aggregate state still appended through immutable Block columns on the hot add/merge path, and nullable schema-change conversion could copy a stale null-map prefix after ORC/Parquet COW cloning. Keep the sort aggregate state as a MutableBlock, remove a read-only mutable assertion, and copy nullable null-map slices from the appended source range. Release note None Check List (For Author) - Test: Unit Test - ./run-be-ut.sh --run --filter=AggregateFunctionSortDataTest.merge_does_not_share_rhs_block:OrcReaderFillDataTest.SchemaChangeNullableNullMapUsesAppendedSlice:ParquetColumnConvertTest.AlignNullMapUsesAppendedSourceSlice -j 100 - ./run-regression-test.sh --run -f regression-test/suites/query_p0/runtimefilterV2/rfv2.groovy - ./build.sh --be -j 100 - build-support/clang-format.sh - build-support/check-format.sh - git diff --check - Local cluster restart and select 1/show backends - build-support/run-clang-tidy.sh --build-dir be/build_ASAN attempted; changed-line findings were fixed, remaining failures are existing ORC/test include/jni-util baseline or tooling diagnostics - Behavior changed: No - Does this need documentation: No
### What problem does this PR solve? Issue Number: None Related PR: apache#63001 Problem Summary: Parquet conversion aligned nullable source null maps by source nested row counts. Under the COW assert-mutability contract, cached nullable sources may carry an old destination null-map prefix while their nested columns only contain the current batch, and fixed-length physical columns may store multiple nested values per logical row. This caused ParquetReaderTest.varbinary_string and adjacent varbinary paths to abort. Track destination null-map length separately and copy the appended source null-map slice by logical new rows. ### Release note None ### Check List (For Author) - Test: Unit Test - ./run-be-ut.sh --run --filter=ParquetReaderTest.varbinary_string -j 100 - ./run-be-ut.sh --run --filter=ParquetReaderTest.uuid_varbinary:ParquetReaderTest.varbinary_varbinary:ParquetReaderTest.varbinary_string:ParquetReaderTest.varbinary_string2:ParquetColumnConvertTest.*NullMap* -j 100 - ./run-be-ut.sh --run --filter=ParquetReaderTest.*:ParquetColumnConvertTest.*:OrcReaderFillDataTest.SchemaChangeNullableNullMapUsesAppendedSlice -j 100 - build-support/check-format.sh - Behavior changed: No - Does this need documentation: No
|
run buildall |
|
/review |
There was a problem hiding this comment.
I found two COW correctness issues that should be fixed before this PR lands.
Critical checkpoint conclusions:
- Goal and tests: the PR goal is valid, and it adds focused BE unit coverage for several mem-reuse/COW paths, but the new shared complex-column paths still miss important invariant coverage.
- Scope/focus: the change is broad but mostly mechanical and aligned with the new
assume_mutable()contract; the issues below are localized. - Concurrency/lifecycle/config/compatibility: no new threads, lifecycle objects, config items, or serialization format compatibility changes were identified in the reviewed changes.
- Parallel paths: many parallel mem-reuse paths were updated; however, shared
ColumnArrayconstruction and recursive map dedup remain inconsistent with the same COW-safety principle. - Error handling: no ignored
Statusissue was found in the reviewed changes. - Tests: added tests cover several regressions, but not invalid shared
ColumnArray::create(ColumnPtr, ...)construction or recursive nested-map dedup with shared subcolumns. - Observability: no additional observability appears necessary for these internal COW fixes.
- Transaction/storage correctness: no version/delete-bitmap/persistence issue was identified for this PR.
- Performance: no blocking performance regression was identified beyond the expected COW cloning when columns are shared.
User focus: no additional user-provided review focus was supplied.
| @@ -98,6 +100,21 @@ ColumnArray::ColumnArray(MutableColumnPtr&& nested_column) : data(std::move(nest | |||
| offsets = ColumnOffsets::create(); | |||
| } | |||
|
|
|||
There was a problem hiding this comment.
This shared-column constructor skips the invariants enforced by the mutable constructor above: offsets_column must actually be ColumnOffset64, and when offsets are non-empty the nested data size must match offsets.back(). Public callers such as ColumnArray::create(const ColumnPtr&, const ColumnPtr&) now route here, so an invalid array can be constructed and later offset_at/size_at, filtering, or serialization can read inconsistent offsets. Please keep this path shared, but still perform the same const-only, offsets-type, and data-size validation using const access.
| RETURN_IF_ERROR(const_cast<ColumnMap*>(values_map)->deduplicate_keys(recursive)); | ||
| } | ||
| } | ||
|
|
There was a problem hiding this comment.
This recursive mutation bypasses COW by const-casting the nested ColumnMap reached through values_column. If the map value subcolumn is shared, deduplicate_keys(true) can move/filter the nested map's subcolumns in place and mutate all aliases of that nested map. This is distinct from the top-level filtering below, which correctly detaches with IColumn::mutate(std::move(...)); the recursive value path should similarly detach the nullable/value subcolumn and write the deduplicated nested map back into values_column before continuing.
TPC-H: Total hot run time: 30127 ms |
TPC-DS: Total hot run time: 174246 ms |
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
Issue Number: N/A Related PR: apache#63001 Problem Summary: Restore ownership-safe COW mutation after assume_mutable became an assertion, and preserve shared immutable subcolumns for return-new wrappers to avoid unnecessary deep copies. None - Test: Unit Test / Manual test - ./run-be-ut.sh --run --filter=ColumnArrayTest.SharedCreateValidatesOffsetsAndDataSize:ColumnNullableTest.SharedCreatePreservesImmutableSubcolumns:ColumnMapTest2.SharedCreatePreservesImmutableSubcolumns:ColumnMapTest2.ConstFilterAndPermuteKeepInputAliasesUntouched:ColumnMapTest2.DeduplicateNestedNullableMapValuesDetachesSharedValueColumn:ComplexTypeTest.DeserializeStructWritesBackSharedChildren:VariantUtilTest.ParseNullableScalarVariantDetachesNestedAlias -j 100 - PATH=/mnt/disk6/common/ldb_toolchain_toucan/bin:/mnt/disk3/zhaochangle/.opencode/bin:/mnt/disk3/zhaochangle/.local/bin:/mnt/disk6/common/apache-maven-3.9.14/bin:/mnt/disk6/common/ldb_toolchain_028/bin:/mnt/disk6/common/jdk-17.0.16/bin:/mnt/disk6/common/node-v24.14.1-linux-x64/bin:/mnt/disk3/zhaochangle/.bun/bin:/mnt/disk3/zhaochangle/.opencode/bin:/mnt/disk3/zhaochangle/.local/bin:/mnt/disk6/common/apache-maven-3.9.14/bin:/mnt/disk6/common/ldb_toolchain_028/bin:/mnt/disk6/common/jdk-17.0.16/bin:/mnt/disk6/common/node-v24.14.1-linux-x64/bin:/mnt/disk6/common/apache-maven-3.9.14/bin:/mnt/disk6/common/ldb_toolchain_028/bin:/mnt/disk6/common/jdk-17.0.16/bin:/mnt/disk6/common/node-v24.14.1-linux-x64/bin:/mnt/disk3/zhaochangle/.codex/tmp/arg0/codex-arg083BlKk:/mnt/disk6/common/node-v24.14.1-linux-x64/lib/node_modules/@openai/codex/node_modules/@openai/codex-linux-x64/vendor/x86_64-unknown-linux-musl/path:/mnt/disk3/zhaochangle/.bun/bin:/mnt/disk3/zhaochangle/.opencode/bin:/mnt/disk3/zhaochangle/.local/bin:/mnt/disk6/common/apache-maven-3.9.14/bin:/mnt/disk6/common/ldb_toolchain_028/bin:/mnt/disk6/common/jdk-17.0.16/bin:/mnt/disk6/common/node-v24.14.1-linux-x64/bin:/mnt/disk6/common/apache-maven-3.9.14/bin:/mnt/disk6/common/ldb_toolchain_028/bin:/mnt/disk6/common/jdk-17.0.16/bin:/mnt/disk6/common/node-v24.14.1-linux-x64/bin:/mnt/disk3/zhaochangle/.opencode/bin:/mnt/disk3/zhaochangle/.local/bin:/mnt/disk3/zhaochangle/.local/bin:/mnt/disk3/zhaochangle/bin:/mnt/disk6/common/apache-maven-3.9.14/bin:/mnt/disk6/common/ldb_toolchain_028/bin:/mnt/disk6/common/jdk-17.0.16/bin:/mnt/disk6/common/node-v24.14.1-linux-x64/bin:/usr/share/Modules/bin:/usr/lib64/ccache:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin build-support/check-format.sh - git diff --check - ./build.sh --be -j 100 - Behavior changed: No - Does this need documentation: No
Issue Number: None Related PR: apache#63001 Problem Summary: The BE COW refactor makes assume_mutable an ownership assertion, but several live Block paths still expressed mutation by manually moving ColumnPtr or MutableBlock owners out and writing them back. That made it easy to leave a Block with moved-from columns on RETURN_IF_ERROR, or to reintroduce hot-path per-row mutate calls. This change codifies the new pattern: std::move(block).mutate_columns() is the stealing API for throwaway/rvalue Blocks; block.mutate_columns_scoped() is the RAII API for temporary whole-block mutation and restores on every exit path; block.mutate_column_scoped(pos) is the RAII owner-slot API for modifying one live Block column; VectorizedUtils::build_scoped_mutable_mem_reuse_block() returns a scoped guard, so callers first hold the guard and then borrow MutableBlock. The patch removes default ScopedMutableBlock construction, avoids copying schema in ScopedMutableColumns, migrates table/orc/parquet/es readers and executor paths to the scoped APIs, and keeps hot row loops on MutableBlock/MutableColumns instead of repeated mutate(). Tests exercise early-return restore, shared-column detach, null-slot recreation, live schema access, LocalExchanger error paths, and TableFormatReader partition/missing-column COW contracts. None - Test: - Unit Test: ./run-be-ut.sh --run --filter=BlockTest.ScopedMutableColumnsRestoreOnErrorAndDetachSharedColumn:BlockTest.ScopedMutableColumnsReadSchemaFromLiveBlock:BlockTest.ScopedMutableColumnRestoreOnErrorDetachSharedAndCreateMissingColumn:BlockTest.ScopedMutableBlockRestoreOnErrorAndDetachSharedColumn:LocalExchangerTest.ShuffleExchangerRestoreOutputBlockOnAddRowsError:TableFormatReaderTest.FillPartitionColumnRestoresSharedColumnOnDeserializeError:TableFormatReaderTest.FillMissingNullableColumnDetachesSharedBlockSlot -j 100 - Regression test: ./run-regression-test.sh --run -d external_table_p0/hive -s test_hive_openx_json - Manual test: ./build.sh --be -j 100 - Static check: git diff --check; build-support/check-format.sh; build-support/run-clang-tidy.sh --build-dir be/build_ASAN - Behavior changed: No - Does this need documentation: No
### What problem does this PR solve? Issue Number: close #xxx Related PR: apache#63001 Problem Summary: Parquet FIXED_LEN_BYTE_ARRAY physical columns are now read through ColumnFixedLengthObject for COW-safe reuse, but several decoder paths still assumed primitive fixed-size data types. Dictionary and byte-stream-split decoders called get_size_of_value_in_memory() on DataTypeFixedLengthObject, and delta byte array fixed-length decode still wrote through a ColumnInt8 owner. This could fail external scans with DataTypeFixedLengthObject size checks or write through the wrong column representation. This change teaches the fixed-length parquet decoders to use the ColumnFixedLengthObject item size when that physical column is used, keeps the legacy primitive/ColumnInt8 paths, and adds BE UT coverage for plain, dictionary, byte-stream-split, and delta byte array fixed-length object decoding including filter/null cases. ### Release note None ### Check List (For Author) - Test: Unit Test - ./run-be-ut.sh --run --filter=FixLengthPlainDecoderTest.*:FixLengthDictDecoderTest.*:ByteStreamSplitDecoderTest.*:DeltaByteArrayDecoderTest.*:DeltaLengthByteArrayDecoderTest.*:ParquetColumnConvertTest.*:ParquetReaderTest.uuid_varbinary:ParquetReaderTest.varbinary_varbinary:ParquetReaderTest.varbinary_string:ParquetReaderTest.varbinary_string2 - Behavior changed: No - Does this need documentation: No
Issue Number: close #xxx Related PR: apache#63001 Problem Summary: A few COW mutation call sites still manually moved column owners out of a live Block and restored them only at the end of the local success path. In RowIdStorageReader::read_doris_format_row(), seek_and_read_by_rowid() may return an error after moving one result column, leaving result_block with a moved-from column. StreamingAggLocalState::_pre_agg_with_serialized_key() had the same restore-on-error risk in the mem-reuse output path when streaming_agg_serialize_to_column() returned an error. This change uses Block::mutate_column_scoped() for rowid single-column owner slots and Block::mutate_columns_scoped() for the streaming pre-agg mem-reuse output block, so every Status return path restores the live Block. Adjacent rowid single-column append paths were also moved to the same scoped owner-slot pattern for consistency. None - Test: Unit Test - ./run-be-ut.sh --run --filter=StreamingAggOperatorTest.*:BlockTest.ScopedMutableColumnRestoreOnErrorDetachSharedAndCreateMissingColumn:BlockTest.ScopedMutableColumnsRestoreOnErrorAndDetachSharedColumn - Behavior changed: No - Does this need documentation: No
Issue Number: close #xxx Related PR: apache#63001 Problem Summary: BE UT merges PR head into the latest master before building tests. After master added string-overflow MutableBlock merge tests, those tests still used the removed MutableBlock(Block*) live-block constructor. That constructor is intentionally unavailable in the new COW model because live output blocks need scoped restore-on-error ownership. This updates the tests to use ScopedMutableBlock while preserving the checked overflow and ignore-overflow assertions. None - Test: Unit Test - ./run-be-ut.sh --run --filter=BlockTest.merge_returns_error_when_checked_string_append_exceeds_limit:BlockTest.merge_ignore_overflow_keeps_owned_accumulation_convertible -j100 - Behavior changed: No - Does this need documentation: No
### What problem does this PR solve? Issue Number: close #xxx Related PR: apache#63001 Problem Summary: The branch accidentally included a local COW audit document under docs/dev. The document is useful as local working notes, but it should not be part of the BE COW implementation diff. Remove it from the tracked branch state while leaving local notes outside the commit. ### Release note None ### Check List (For Author) - Test: No need to test (document removal only) - Behavior changed: No - Does this need documentation: No
### What problem does this PR solve? Issue Number: close #xxx Related PR: apache#63001 Problem Summary: After rebasing the COW branch onto latest master, several newly introduced or newly exposed code paths still called Block::mutate_columns() on live Block objects. The COW API now intentionally makes mutate_columns() an rvalue-only stealing operation, so live blocks must use scoped owner APIs that restore columns on every exit path. This updates nested-loop join lazy materialization, row binlog block filling, historical row retrieval, and affected tests to use mutate_columns_scoped() when the Block remains live. It also updates the block overflow tests to use the current string overflow debug point instead of removed test helpers/configuration. ### Release note None ### Check List (For Author) - Test: Unit Test - ./build.sh --be - ./run-be-ut.sh - Behavior changed: No - Does this need documentation: No
### What problem does this PR solve? Issue Number: close #xxx Related PR: apache#63001 Problem Summary: BE UT on PR apache#63001 failed in BlockTest.merge_returns_error_when_checked_string_append_exceeds_limit and aborted in ComplexTypeTest.DeserializeArrayWritesBackSharedNestedColumn. The block test relied on a string-overflow debug point that is not used by MutableBlock::merge(), so it expected an error on a path that legitimately succeeded. The complex-type test also constructed the destination array with a raw Int32 nested column, while DataTypeArray canonicalizes nested values as nullable and therefore calls DataTypeNullable::deserialize(). That exposed a real COW gap: DataTypeNullable::deserialize() wrote directly into the nested column and null map even when those subcolumns were still shared after a shallow COW clone. This patch detaches nullable nested and null-map owner slots before deserializing and writes them back through ColumnNullable::replace_columns(). It also updates the array COW deserialize test to use shared nullable subcolumns and changes the scoped block merge test to use a deterministic schema mismatch error. ### Release note None ### Check List (For Author) - Test: Unit Test - PATH=/tmp/codex-clang-format-16:/mnt/disk3/zhaochangle/.opencode/bin:/mnt/disk3/zhaochangle/.local/bin:/mnt/disk6/common/apache-maven-3.9.14/bin:/mnt/disk6/common/ldb_toolchain_028/bin:/mnt/disk6/common/jdk-17.0.16/bin:/mnt/disk6/common/node-v24.14.1-linux-x64/bin:/mnt/disk3/zhaochangle/.bun/bin:/mnt/disk3/zhaochangle/.opencode/bin:/mnt/disk3/zhaochangle/.local/bin:/mnt/disk6/common/apache-maven-3.9.14/bin:/mnt/disk6/common/ldb_toolchain_028/bin:/mnt/disk6/common/jdk-17.0.16/bin:/mnt/disk6/common/node-v24.14.1-linux-x64/bin:/mnt/disk6/common/apache-maven-3.9.14/bin:/mnt/disk6/common/ldb_toolchain_028/bin:/mnt/disk6/common/jdk-17.0.16/bin:/mnt/disk6/common/node-v24.14.1-linux-x64/bin:/mnt/disk3/zhaochangle/.codex/tmp/arg0/codex-arg0qpYuvr:/mnt/disk6/common/node-v24.14.1-linux-x64/lib/node_modules/@openai/codex/node_modules/@openai/codex-linux-x64/vendor/x86_64-unknown-linux-musl/path:/mnt/disk3/zhaochangle/.bun/bin:/mnt/disk3/zhaochangle/.opencode/bin:/mnt/disk3/zhaochangle/.local/bin:/mnt/disk6/common/apache-maven-3.9.14/bin:/mnt/disk6/common/ldb_toolchain_028/bin:/mnt/disk6/common/jdk-17.0.16/bin:/mnt/disk6/common/node-v24.14.1-linux-x64/bin:/mnt/disk6/common/apache-maven-3.9.14/bin:/mnt/disk6/common/ldb_toolchain_028/bin:/mnt/disk6/common/jdk-17.0.16/bin:/mnt/disk6/common/node-v24.14.1-linux-x64/bin:/mnt/disk3/zhaochangle/.opencode/bin:/mnt/disk3/zhaochangle/.local/bin:/mnt/disk3/zhaochangle/.local/bin:/mnt/disk3/zhaochangle/bin:/mnt/disk6/common/apache-maven-3.9.14/bin:/mnt/disk6/common/ldb_toolchain_028/bin:/mnt/disk6/common/jdk-17.0.16/bin:/mnt/disk6/common/node-v24.14.1-linux-x64/bin:/usr/share/Modules/bin:/usr/lib64/ccache:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin build-support/clang-format.sh - PATH=/tmp/codex-clang-format-16:/mnt/disk3/zhaochangle/.opencode/bin:/mnt/disk3/zhaochangle/.local/bin:/mnt/disk6/common/apache-maven-3.9.14/bin:/mnt/disk6/common/ldb_toolchain_028/bin:/mnt/disk6/common/jdk-17.0.16/bin:/mnt/disk6/common/node-v24.14.1-linux-x64/bin:/mnt/disk3/zhaochangle/.bun/bin:/mnt/disk3/zhaochangle/.opencode/bin:/mnt/disk3/zhaochangle/.local/bin:/mnt/disk6/common/apache-maven-3.9.14/bin:/mnt/disk6/common/ldb_toolchain_028/bin:/mnt/disk6/common/jdk-17.0.16/bin:/mnt/disk6/common/node-v24.14.1-linux-x64/bin:/mnt/disk6/common/apache-maven-3.9.14/bin:/mnt/disk6/common/ldb_toolchain_028/bin:/mnt/disk6/common/jdk-17.0.16/bin:/mnt/disk6/common/node-v24.14.1-linux-x64/bin:/mnt/disk3/zhaochangle/.codex/tmp/arg0/codex-arg0qpYuvr:/mnt/disk6/common/node-v24.14.1-linux-x64/lib/node_modules/@openai/codex/node_modules/@openai/codex-linux-x64/vendor/x86_64-unknown-linux-musl/path:/mnt/disk3/zhaochangle/.bun/bin:/mnt/disk3/zhaochangle/.opencode/bin:/mnt/disk3/zhaochangle/.local/bin:/mnt/disk6/common/apache-maven-3.9.14/bin:/mnt/disk6/common/ldb_toolchain_028/bin:/mnt/disk6/common/jdk-17.0.16/bin:/mnt/disk6/common/node-v24.14.1-linux-x64/bin:/mnt/disk6/common/apache-maven-3.9.14/bin:/mnt/disk6/common/ldb_toolchain_028/bin:/mnt/disk6/common/jdk-17.0.16/bin:/mnt/disk6/common/node-v24.14.1-linux-x64/bin:/mnt/disk3/zhaochangle/.opencode/bin:/mnt/disk3/zhaochangle/.local/bin:/mnt/disk3/zhaochangle/.local/bin:/mnt/disk3/zhaochangle/bin:/mnt/disk6/common/apache-maven-3.9.14/bin:/mnt/disk6/common/ldb_toolchain_028/bin:/mnt/disk6/common/jdk-17.0.16/bin:/mnt/disk6/common/node-v24.14.1-linux-x64/bin:/usr/share/Modules/bin:/usr/lib64/ccache:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin build-support/check-format.sh - ./run-be-ut.sh --run --filter=BlockTest.merge_returns_error_and_restores_output_block:BlockTest.merge_ignore_overflow_keeps_owned_accumulation_convertible:ComplexTypeTest.DeserializeArrayWritesBackSharedNestedColumn - ./run-be-ut.sh --run --filter=ComplexTypeTest.Deserialize* - Behavior changed: No - Does this need documentation: No
### What problem does this PR solve? Issue Number: None Related PR: apache#63001 Problem Summary: DataTypeArray now normalizes array elements to nullable physical columns. AggregateFunctionForEach still passed the array data column directly into the nested aggregate when writing results, so nested aggregates such as array_agg could receive ColumnNullable where they expected ColumnArray. This commit unwraps the array element nullable wrapper for non-nullable nested aggregate results, maintains the element null map, and updates tests that still built DataTypeArray columns with the old raw nested shape. ### Release note None ### Check List (For Author) - Test: Unit Test - ./run-be-ut.sh --run --filter='FunctionVariantCast.*:AggregateFunctionArrayAggTest.*:VRetentionTest.*:SchemaUtilTest.TestArrayDimensions:SchemaUtilTest.TestCastColumnEdgeCases:DataTypeArrayTest.CreateColumnUsesNullableNestedColumn:AIFunctionTest.AIMaskTest:AIFunctionTest.AIExtractTest:AIFunctionTest.AIClassifyTest' - ./run-be-ut.sh --run --filter='AggGroupArrayIntersectTest.*:TableFunctionOperatorTest.block_fast_path_explode*' - PATH=/tmp/codex-clang-format-16:$PATH build-support/check-format.sh - Behavior changed: No - Does this need documentation: No
### What problem does this PR solve? Issue Number: close #xxx Related PR: apache#63001 Problem Summary: After rebasing onto upstream master, Segment::seek_and_read_by_rowid expects a sorted vector of row ids instead of a single row id. The COW conflict resolution kept scoped block/column ownership but still called the old single-row form in rowid fetch and point query paths, causing BE compilation to fail. This updates both call sites to pass the batched row id vector while preserving scoped column ownership and restore-on-error behavior. ### Release note None ### Check List (For Author) - Test: Unit Test - ./build.sh --be - PATH=/tmp/codex-clang-format-16:/mnt/disk3/zhaochangle/.opencode/bin:/mnt/disk3/zhaochangle/.local/bin:/mnt/disk6/common/apache-maven-3.9.14/bin:/mnt/disk6/common/ldb_toolchain_028/bin:/mnt/disk6/common/jdk-17.0.16/bin:/mnt/disk6/common/node-v24.14.1-linux-x64/bin:/mnt/disk3/zhaochangle/.bun/bin:/mnt/disk3/zhaochangle/.opencode/bin:/mnt/disk3/zhaochangle/.local/bin:/mnt/disk6/common/apache-maven-3.9.14/bin:/mnt/disk6/common/ldb_toolchain_028/bin:/mnt/disk6/common/jdk-17.0.16/bin:/mnt/disk6/common/node-v24.14.1-linux-x64/bin:/mnt/disk6/common/apache-maven-3.9.14/bin:/mnt/disk6/common/ldb_toolchain_028/bin:/mnt/disk6/common/jdk-17.0.16/bin:/mnt/disk6/common/node-v24.14.1-linux-x64/bin:/mnt/disk3/zhaochangle/.codex/tmp/arg0/codex-arg0DFZQLQ:/mnt/disk6/common/node-v24.14.1-linux-x64/lib/node_modules/@openai/codex/node_modules/@openai/codex-linux-x64/vendor/x86_64-unknown-linux-musl/codex-path:/mnt/disk3/zhaochangle/.bun/bin:/mnt/disk3/zhaochangle/.opencode/bin:/mnt/disk3/zhaochangle/.local/bin:/mnt/disk6/common/apache-maven-3.9.14/bin:/mnt/disk6/common/ldb_toolchain_028/bin:/mnt/disk6/common/jdk-17.0.16/bin:/mnt/disk6/common/node-v24.14.1-linux-x64/bin:/mnt/disk6/common/apache-maven-3.9.14/bin:/mnt/disk6/common/ldb_toolchain_028/bin:/mnt/disk6/common/jdk-17.0.16/bin:/mnt/disk6/common/node-v24.14.1-linux-x64/bin:/mnt/disk3/zhaochangle/.opencode/bin:/mnt/disk3/zhaochangle/.local/bin:/mnt/disk3/zhaochangle/.local/bin:/mnt/disk3/zhaochangle/bin:/mnt/disk6/common/apache-maven-3.9.14/bin:/mnt/disk6/common/ldb_toolchain_028/bin:/mnt/disk6/common/jdk-17.0.16/bin:/mnt/disk6/common/node-v24.14.1-linux-x64/bin:/usr/share/Modules/bin:/usr/lib64/ccache:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/opt/tableau/tableau_server/packages/customer-bin.20251.25.0520.1026 build-support/check-format.sh - ./run-be-ut.sh --run --filter='FunctionVariantCast.*:AggregateFunctionArrayAggTest.*:VRetentionTest.*:SchemaUtilTest.TestArrayDimensions:SchemaUtilTest.TestCastColumnEdgeCases:DataTypeArrayTest.CreateColumnUsesNullableNestedColumn:AIFunctionTest.AIMaskTest:AIFunctionTest.AIExtractTest:AIFunctionTest.AIClassifyTest' - ./run-be-ut.sh --run --filter='AggGroupArrayIntersectTest.*:TableFunctionOperatorTest.block_fast_path_explode*' - ./run-be-ut.sh --run --filter='BlockTest.ClearSelectedColumnDataClonesSharedColumn:BlockTest.ClearColumnDataPropagatesSharedCloneEmptyFailure:BlockTest.ClearSelectedColumnDataPropagatesSharedCloneEmptyFailure:BlockTest.ScopedMutable*' - Behavior changed: No - Does this need documentation: No
|
run buildall |
TPC-H: Total hot run time: 31612 ms |
TPC-DS: Total hot run time: 172368 ms |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
|
skip buildall |
|
PR approved by at least one committer and no changes requested. |
followup #63001. we changed the actual meaning of `assume_mutable`
What problem does this PR solve?
Issue Number: N/A
Related PR: N/A
Problem Summary: This PR changes the BE COW contract around
assume_mutable: callers may only use it when they already own the column exclusively, and the helper now behaves as an ownership assertion instead of a silent mutable borrow. SharedColumnPtr/Blockdata must go throughmutate()or one of the scoped owner-slot mutation helpers before being modified. This lets blocks and columns rely on real COW semantics and avoids both unsafe in-place mutation of shared data and unnecessary cross-operator copies.The main changes are:
assume_mutable/assume_mutable_refvalidate exclusive ownership and document audited usage indocs/dev/be-cow-assume-mutable-audit.md.Block::mutate_columns().MutableBlock/MutableColumnsusage in hot paths.Release note
None
Check List (For Author)
./run-be-ut.sh --run --filter=BlockTest.ScopedMutableColumnsRestoreOnErrorAndDetachSharedColumn:BlockTest.ScopedMutableColumnsReadSchemaFromLiveBlock:BlockTest.ScopedMutableColumnRestoreOnErrorDetachSharedAndCreateMissingColumn:BlockTest.ScopedMutableBlockRestoreOnErrorAndDetachSharedColumn:LocalExchangerTest.ShuffleExchangerRestoreOutputBlockOnAddRowsError:TableFormatReaderTest.FillPartitionColumnRestoresSharedColumnOnDeserializeError:TableFormatReaderTest.FillMissingNullableColumnDetachesSharedBlockSlot -j 100./build.sh --be -j 100PATH=/mnt/disk6/common/ldb_toolchain_toucan/bin:$PATH build-support/check-format.shgit diff --check