-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[Improvement](shuffle) add Crc32CHashPartitioner #59052
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
update fix
2c3eeec to
bd9b4cf
Compare
|
run buildall |
|
run buildall |
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
|
run buildall |
|
run buildall |
Cloud UT Coverage ReportIncrement line coverage Increment coverage report
|
FE UT Coverage ReportIncrement line coverage |
|
run buildall |
FE UT Coverage ReportIncrement line coverage |
FE Regression Coverage ReportIncrement line coverage |
|
run buildall |
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
FE Regression Coverage ReportIncrement line coverage |
|
run buildall |
Cloud UT Coverage ReportIncrement line coverage Increment coverage report
|
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
FE Regression Coverage ReportIncrement line coverage |
|
run buildall |
Cloud UT Coverage ReportIncrement line coverage Increment coverage report
|
FE UT Coverage ReportIncrement line coverage |
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
1 similar comment
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
FE Regression Coverage ReportIncrement line coverage |
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
FE Regression Coverage ReportIncrement line coverage |
HappenLee
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
PR approved by at least one committer and no changes requested. |
|
PR approved by anyone and no changes requested. |
zclllyybb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
add Crc32CHashPartitioner <img width="596" height="4284" alt="图片" src="https://github.com/user-attachments/assets/5773ea04-b01a-4c8c-ba5a-0c725cb11f11" /> This pull request refactors the codebase to standardize the usage of the CRC32C checksum library by replacing the custom `util/crc32c.h` header and its functions with the upstream `crc32c` library (`<crc32c/crc32c.h>`) and its API. It also updates function calls to use the correct data types expected by the new library and ensures consistent checksum calculation across multiple modules related to file I/O, compression, and storage. **Migration to Upstream CRC32C Library** * Replaced all includes of `"util/crc32c.h"` with `<crc32c/crc32c.h>` and removed the custom header from all relevant files. [[1]](diffhunk://#diff-0572424f9b6fe1561e15b070c1155b1b8f9272499029d425ff5a8d0e0aa8f40fL24) [[2]](diffhunk://#diff-a4327d67c48e4a4115a1ac9bc0a82a646bbfcb141d80f5f428142f55027e16a1R20-L21) [[3]](diffhunk://#diff-f46297d8957a9929f575febc300a004c144e106ea6893f1b95508ab006503407R18-L23) [[4]](diffhunk://#diff-3ef6a4f806adc33273c229fbdb827c072152651d5930b19affde4c1f8984c51cR20-L23) [[5]](diffhunk://#diff-52cdc310f4ed34081299dff53c543455745a834dbc5c50a2c21b765c0c90c3f8R18) [[6]](diffhunk://#diff-03a87568e2651d1524985a56a278f2e2932667c1e92efc60d0c5a750f0ad316bR20-R21) [[7]](diffhunk://#diff-23fa0193d626ba712c4186c66bcd1809c7e55bfc04ea10f5a91c691ed3e04727R21) [[8]](diffhunk://#diff-4dc7440cc992e7f9bdd8ec9c5bfc5a6194f9d78fc5ff359c4781d992df4e610bR20) [[9]](diffhunk://#diff-5eb6e846447db952b75ba0fd9bc1614702c428689c93e089a952ea414c23b7fdR20) [[10]](diffhunk://#diff-c33a6f975ebaa66163e68ba51a4d9ce0cbfd6b5d063edce503130d7bae502c53R20) [[11]](diffhunk://#diff-8061bb86d18c96049b63aa2caf4851933bff6b16cefa5460b1ee736d6f0ac883R27-R28) [[12]](diffhunk://#diff-9018eae3f9bef2cf64079552ce4d9c3fd3535a31b86a4ff496d29853c4968cb0R20) * Updated all function calls from `crc32c::Value(...)` to `crc32c::Crc32c(...)` for computing CRC32C checksums. [[1]](diffhunk://#diff-0572424f9b6fe1561e15b070c1155b1b8f9272499029d425ff5a8d0e0aa8f40fL120-R119) [[2]](diffhunk://#diff-a4327d67c48e4a4115a1ac9bc0a82a646bbfcb141d80f5f428142f55027e16a1L89-R90) [[3]](diffhunk://#diff-3ef6a4f806adc33273c229fbdb827c072152651d5930b19affde4c1f8984c51cL189-R190) [[4]](diffhunk://#diff-3ef6a4f806adc33273c229fbdb827c072152651d5930b19affde4c1f8984c51cL420-R421) [[5]](diffhunk://#diff-52cdc310f4ed34081299dff53c543455745a834dbc5c50a2c21b765c0c90c3f8L180-R180) [[6]](diffhunk://#diff-ea6232df0f48fea9e5403472da0bc4206acfd69b676c1b5fbc2d2df13df24624L149-R150) [[7]](diffhunk://#diff-5eb6e846447db952b75ba0fd9bc1614702c428689c93e089a952ea414c23b7fdL178-R181) [[8]](diffhunk://#diff-c33a6f975ebaa66163e68ba51a4d9ce0cbfd6b5d063edce503130d7bae502c53L472-R472) [[9]](diffhunk://#diff-8061bb86d18c96049b63aa2caf4851933bff6b16cefa5460b1ee736d6f0ac883L1158-R1159) * Updated all function calls from `crc32c::Extend(...)` to use the new function signature, casting data pointers to `const uint8_t*` as required by the upstream library. [[1]](diffhunk://#diff-f46297d8957a9929f575febc300a004c144e106ea6893f1b95508ab006503407L320-R321) [[2]](diffhunk://#diff-f46297d8957a9929f575febc300a004c144e106ea6893f1b95508ab006503407L369-R370) [[3]](diffhunk://#diff-ea6232df0f48fea9e5403472da0bc4206acfd69b676c1b5fbc2d2df13df24624L103-R104) [[4]](diffhunk://#diff-23fa0193d626ba712c4186c66bcd1809c7e55bfc04ea10f5a91c691ed3e04727L2037-R2037) [[5]](diffhunk://#diff-4dc7440cc992e7f9bdd8ec9c5bfc5a6194f9d78fc5ff359c4781d992df4e610bL734-R734) **Checksum Calculation Logic** * Modified checksum calculation for multi-slice data by iteratively using `crc32c::Extend` over each slice, ensuring correct cumulative checksum computation. * Updated checksum verification logic to use the new API and data types, improving reliability and consistency across modules. [[1]](diffhunk://#diff-0572424f9b6fe1561e15b070c1155b1b8f9272499029d425ff5a8d0e0aa8f40fL120-R119) [[2]](diffhunk://#diff-a4327d67c48e4a4115a1ac9bc0a82a646bbfcb141d80f5f428142f55027e16a1L89-R90) [[3]](diffhunk://#diff-3ef6a4f806adc33273c229fbdb827c072152651d5930b19affde4c1f8984c51cL189-R190) [[4]](diffhunk://#diff-3ef6a4f806adc33273c229fbdb827c072152651d5930b19affde4c1f8984c51cL420-R421) [[5]](diffhunk://#diff-52cdc310f4ed34081299dff53c543455745a834dbc5c50a2c21b765c0c90c3f8L180-R180) [[6]](diffhunk://#diff-ea6232df0f48fea9e5403472da0bc4206acfd69b676c1b5fbc2d2df13df24624L149-R150) [[7]](diffhunk://#diff-5eb6e846447db952b75ba0fd9bc1614702c428689c93e089a952ea414c23b7fdL178-R181) [[8]](diffhunk://#diff-c33a6f975ebaa66163e68ba51a4d9ce0cbfd6b5d063edce503130d7bae502c53L472-R472) [[9]](diffhunk://#diff-8061bb86d18c96049b63aa2caf4851933bff6b16cefa5460b1ee736d6f0ac883L1158-R1159) **Code Clean-up and Consistency** * Removed all redundant or obsolete includes of the custom `crc32c.h` header. [[1]](diffhunk://#diff-a4327d67c48e4a4115a1ac9bc0a82a646bbfcb141d80f5f428142f55027e16a1R20-L21) [[2]](diffhunk://#diff-f46297d8957a9929f575febc300a004c144e106ea6893f1b95508ab006503407R18-L23) [[3]](diffhunk://#diff-3ef6a4f806adc33273c229fbdb827c072152651d5930b19affde4c1f8984c51cR20-L23) [[4]](diffhunk://#diff-52cdc310f4ed34081299dff53c543455745a834dbc5c50a2c21b765c0c90c3f8L30) [[5]](diffhunk://#diff-03a87568e2651d1524985a56a278f2e2932667c1e92efc60d0c5a750f0ad316bL30) [[6]](diffhunk://#diff-23fa0193d626ba712c4186c66bcd1809c7e55bfc04ea10f5a91c691ed3e04727L51) [[7]](diffhunk://#diff-4dc7440cc992e7f9bdd8ec9c5bfc5a6194f9d78fc5ff359c4781d992df4e610bL49) [[8]](diffhunk://#diff-5eb6e846447db952b75ba0fd9bc1614702c428689c93e089a952ea414c23b7fdL44) [[9]](diffhunk://#diff-c33a6f975ebaa66163e68ba51a4d9ce0cbfd6b5d063edce503130d7bae502c53L69) [[10]](diffhunk://#diff-8061bb86d18c96049b63aa2caf4851933bff6b16cefa5460b1ee736d6f0ac883L65) * Ensured all modules that require CRC32C now directly depend on the upstream library, reducing maintenance overhead and potential for bugs. (all above references) These changes collectively improve code maintainability, reliability, and alignment with upstream best practices for CRC32C checksum operations.
add Crc32CHashPartitioner <img width="596" height="4284" alt="图片" src="https://github.com/user-attachments/assets/5773ea04-b01a-4c8c-ba5a-0c725cb11f11" /> This pull request refactors the codebase to standardize the usage of the CRC32C checksum library by replacing the custom `util/crc32c.h` header and its functions with the upstream `crc32c` library (`<crc32c/crc32c.h>`) and its API. It also updates function calls to use the correct data types expected by the new library and ensures consistent checksum calculation across multiple modules related to file I/O, compression, and storage. **Migration to Upstream CRC32C Library** * Replaced all includes of `"util/crc32c.h"` with `<crc32c/crc32c.h>` and removed the custom header from all relevant files. [[1]](diffhunk://#diff-0572424f9b6fe1561e15b070c1155b1b8f9272499029d425ff5a8d0e0aa8f40fL24) [[2]](diffhunk://#diff-a4327d67c48e4a4115a1ac9bc0a82a646bbfcb141d80f5f428142f55027e16a1R20-L21) [[3]](diffhunk://#diff-f46297d8957a9929f575febc300a004c144e106ea6893f1b95508ab006503407R18-L23) [[4]](diffhunk://#diff-3ef6a4f806adc33273c229fbdb827c072152651d5930b19affde4c1f8984c51cR20-L23) [[5]](diffhunk://#diff-52cdc310f4ed34081299dff53c543455745a834dbc5c50a2c21b765c0c90c3f8R18) [[6]](diffhunk://#diff-03a87568e2651d1524985a56a278f2e2932667c1e92efc60d0c5a750f0ad316bR20-R21) [[7]](diffhunk://#diff-23fa0193d626ba712c4186c66bcd1809c7e55bfc04ea10f5a91c691ed3e04727R21) [[8]](diffhunk://#diff-4dc7440cc992e7f9bdd8ec9c5bfc5a6194f9d78fc5ff359c4781d992df4e610bR20) [[9]](diffhunk://#diff-5eb6e846447db952b75ba0fd9bc1614702c428689c93e089a952ea414c23b7fdR20) [[10]](diffhunk://#diff-c33a6f975ebaa66163e68ba51a4d9ce0cbfd6b5d063edce503130d7bae502c53R20) [[11]](diffhunk://#diff-8061bb86d18c96049b63aa2caf4851933bff6b16cefa5460b1ee736d6f0ac883R27-R28) [[12]](diffhunk://#diff-9018eae3f9bef2cf64079552ce4d9c3fd3535a31b86a4ff496d29853c4968cb0R20) * Updated all function calls from `crc32c::Value(...)` to `crc32c::Crc32c(...)` for computing CRC32C checksums. [[1]](diffhunk://#diff-0572424f9b6fe1561e15b070c1155b1b8f9272499029d425ff5a8d0e0aa8f40fL120-R119) [[2]](diffhunk://#diff-a4327d67c48e4a4115a1ac9bc0a82a646bbfcb141d80f5f428142f55027e16a1L89-R90) [[3]](diffhunk://#diff-3ef6a4f806adc33273c229fbdb827c072152651d5930b19affde4c1f8984c51cL189-R190) [[4]](diffhunk://#diff-3ef6a4f806adc33273c229fbdb827c072152651d5930b19affde4c1f8984c51cL420-R421) [[5]](diffhunk://#diff-52cdc310f4ed34081299dff53c543455745a834dbc5c50a2c21b765c0c90c3f8L180-R180) [[6]](diffhunk://#diff-ea6232df0f48fea9e5403472da0bc4206acfd69b676c1b5fbc2d2df13df24624L149-R150) [[7]](diffhunk://#diff-5eb6e846447db952b75ba0fd9bc1614702c428689c93e089a952ea414c23b7fdL178-R181) [[8]](diffhunk://#diff-c33a6f975ebaa66163e68ba51a4d9ce0cbfd6b5d063edce503130d7bae502c53L472-R472) [[9]](diffhunk://#diff-8061bb86d18c96049b63aa2caf4851933bff6b16cefa5460b1ee736d6f0ac883L1158-R1159) * Updated all function calls from `crc32c::Extend(...)` to use the new function signature, casting data pointers to `const uint8_t*` as required by the upstream library. [[1]](diffhunk://#diff-f46297d8957a9929f575febc300a004c144e106ea6893f1b95508ab006503407L320-R321) [[2]](diffhunk://#diff-f46297d8957a9929f575febc300a004c144e106ea6893f1b95508ab006503407L369-R370) [[3]](diffhunk://#diff-ea6232df0f48fea9e5403472da0bc4206acfd69b676c1b5fbc2d2df13df24624L103-R104) [[4]](diffhunk://#diff-23fa0193d626ba712c4186c66bcd1809c7e55bfc04ea10f5a91c691ed3e04727L2037-R2037) [[5]](diffhunk://#diff-4dc7440cc992e7f9bdd8ec9c5bfc5a6194f9d78fc5ff359c4781d992df4e610bL734-R734) **Checksum Calculation Logic** * Modified checksum calculation for multi-slice data by iteratively using `crc32c::Extend` over each slice, ensuring correct cumulative checksum computation. * Updated checksum verification logic to use the new API and data types, improving reliability and consistency across modules. [[1]](diffhunk://#diff-0572424f9b6fe1561e15b070c1155b1b8f9272499029d425ff5a8d0e0aa8f40fL120-R119) [[2]](diffhunk://#diff-a4327d67c48e4a4115a1ac9bc0a82a646bbfcb141d80f5f428142f55027e16a1L89-R90) [[3]](diffhunk://#diff-3ef6a4f806adc33273c229fbdb827c072152651d5930b19affde4c1f8984c51cL189-R190) [[4]](diffhunk://#diff-3ef6a4f806adc33273c229fbdb827c072152651d5930b19affde4c1f8984c51cL420-R421) [[5]](diffhunk://#diff-52cdc310f4ed34081299dff53c543455745a834dbc5c50a2c21b765c0c90c3f8L180-R180) [[6]](diffhunk://#diff-ea6232df0f48fea9e5403472da0bc4206acfd69b676c1b5fbc2d2df13df24624L149-R150) [[7]](diffhunk://#diff-5eb6e846447db952b75ba0fd9bc1614702c428689c93e089a952ea414c23b7fdL178-R181) [[8]](diffhunk://#diff-c33a6f975ebaa66163e68ba51a4d9ce0cbfd6b5d063edce503130d7bae502c53L472-R472) [[9]](diffhunk://#diff-8061bb86d18c96049b63aa2caf4851933bff6b16cefa5460b1ee736d6f0ac883L1158-R1159) **Code Clean-up and Consistency** * Removed all redundant or obsolete includes of the custom `crc32c.h` header. [[1]](diffhunk://#diff-a4327d67c48e4a4115a1ac9bc0a82a646bbfcb141d80f5f428142f55027e16a1R20-L21) [[2]](diffhunk://#diff-f46297d8957a9929f575febc300a004c144e106ea6893f1b95508ab006503407R18-L23) [[3]](diffhunk://#diff-3ef6a4f806adc33273c229fbdb827c072152651d5930b19affde4c1f8984c51cR20-L23) [[4]](diffhunk://#diff-52cdc310f4ed34081299dff53c543455745a834dbc5c50a2c21b765c0c90c3f8L30) [[5]](diffhunk://#diff-03a87568e2651d1524985a56a278f2e2932667c1e92efc60d0c5a750f0ad316bL30) [[6]](diffhunk://#diff-23fa0193d626ba712c4186c66bcd1809c7e55bfc04ea10f5a91c691ed3e04727L51) [[7]](diffhunk://#diff-4dc7440cc992e7f9bdd8ec9c5bfc5a6194f9d78fc5ff359c4781d992df4e610bL49) [[8]](diffhunk://#diff-5eb6e846447db952b75ba0fd9bc1614702c428689c93e089a952ea414c23b7fdL44) [[9]](diffhunk://#diff-c33a6f975ebaa66163e68ba51a4d9ce0cbfd6b5d063edce503130d7bae502c53L69) [[10]](diffhunk://#diff-8061bb86d18c96049b63aa2caf4851933bff6b16cefa5460b1ee736d6f0ac883L65) * Ensured all modules that require CRC32C now directly depend on the upstream library, reducing maintenance overhead and potential for bugs. (all above references) These changes collectively improve code maintainability, reliability, and alignment with upstream best practices for CRC32C checksum operations.
This reverts commit 5c69a29. Revert "[Chore](thirdparty) add crc32c-1.1.2 to thirdparty (apache#58462)" This reverts commit 066b69e. [Chore](thirdparty) add crc32c-1.1.2 to thirdparty (apache#58462) doris have crc32c from rocksdb now, but it has poorly performance than google/crc32c. 66663538 rows int crc32c-rocksdb 684.879ms crc32c-google 206.360ms 66663538 rows varchar crc32c-rocksdb 1sec368ms crc32c-google 391.290ms This pull request adds support for the `crc32c` third-party dependency to the build environment. The changes include updating the changelog, adding build logic, and configuring the necessary variables to download and build `crc32c`. **Third-party dependency integration:** * Added `crc32c-1.1.2` to the list of third-party dependencies in the changelog (`thirdparty/CHANGELOG.md`). * Added `crc32c` to the default package build list in `build-thirdparty.sh` to ensure it is built by default. * Implemented the `build_crc32c()` function in `build-thirdparty.sh` to handle the build and installation process for `crc32c`. **Build configuration updates:** * Defined download URL, archive name, source directory, and MD5 checksum for `crc32c` in `vars.sh`. * Added `CRC32C` to the `TP_ARCHIVES` array in `vars.sh` so it is included in the set of managed third-party archives. [Chore](hash) use google/crc32c to instead rocksdb/crc32c and crc_hash (apache#58557) doris have crc32c from rocksdb now, but it has poorly performance than google/crc32c. 66663538 rows int crc32c-rocksdb 684.879ms crc32c-google 206.360ms 66663538 rows varchar crc32c-rocksdb 1sec368ms crc32c-google 391.290ms We already have unit tests for rocksdb/crc32c([be/test/util/crc32c_test.cpp](https://github.com/apache/doris/blob/master/be/test/util/crc32c_test.cpp)), so this change is safe This pull request updates the codebase to use the more efficient and modern CRC32C hashing algorithm in place of the older CRC32 implementation. The changes include switching hash functions throughout the code, updating the CRC32C utility implementation to use an external library, and adding the required third-party dependency. This improves hash performance and consistency, and prepares the codebase for future compatibility. **Hashing algorithm migration:** * Replaced all usages of `HashUtil::crc_hash` with `HashUtil::crc32c_hash` in `block_bloom_filter.hpp`, `column_dictionary.h`, and `function_string.h` to utilize CRC32C for better performance and reliability. [[1]](diffhunk://#diff-635476edd1321096d1d32eb6453bed4624e8f23d0580750d515aaad9dfe5404eL79-R79) [[2]](diffhunk://#diff-635476edd1321096d1d32eb6453bed4624e8f23d0580750d515aaad9dfe5404eL108-R108) [[3]](diffhunk://#diff-bf8bb38b6a6eae6cccd7ed62ff64b1a77fbd273a614348b096330abea8331b4dL348-R348) [[4]](diffhunk://#diff-9cc694af32a330f9ffd947df039bdfc12be67b2107c9e612d7861b17c5018176L4601-R4601) * Added the new `crc32c_hash` method to `HashUtil` and marked the old `crc_hash` as deprecated, retaining it only for backward compatibility with historical data. [[1]](diffhunk://#diff-92d951e58f5e0b824254f5eb0d931b604518e4bfbe666b665cd56ed9435667bbL52-R58) [[2]](diffhunk://#diff-92d951e58f5e0b824254f5eb0d931b604518e4bfbe666b665cd56ed9435667bbR68-R69) [[3]](diffhunk://#diff-92d951e58f5e0b824254f5eb0d931b604518e4bfbe666b665cd56ed9435667bbL120-L124) **CRC32C utility refactor and dependency management:** * Refactored `crc32c.cpp` and `crc32c.h` to use the external `crc32c` library, removing the previous custom implementation and lookup tables. Added new utility functions for CRC32C operations. [[1]](diffhunk://#diff-1a21d70259827997bdfd54da21acd6db2ae0a29465873b53dbf8c7e9c6a7e265L18-R38) [[2]](diffhunk://#diff-72d5c6ec3fe2da095fe1413472778c1d56027242035bdb83c62339ccfcca6ed6L18-R33) * Added the `crc32c` third-party dependency in the build configuration to support the new CRC32C utility. **Build and header updates:** * Updated includes in `hash_util.hpp` to reference the new CRC32C utility. [Improvement](shuffle) add Crc32CHashPartitioner (apache#59052) add Crc32CHashPartitioner <img width="596" height="4284" alt="图片" src="https://github.com/user-attachments/assets/5773ea04-b01a-4c8c-ba5a-0c725cb11f11" /> This pull request refactors the codebase to standardize the usage of the CRC32C checksum library by replacing the custom `util/crc32c.h` header and its functions with the upstream `crc32c` library (`<crc32c/crc32c.h>`) and its API. It also updates function calls to use the correct data types expected by the new library and ensures consistent checksum calculation across multiple modules related to file I/O, compression, and storage. **Migration to Upstream CRC32C Library** * Replaced all includes of `"util/crc32c.h"` with `<crc32c/crc32c.h>` and removed the custom header from all relevant files. [[1]](diffhunk://#diff-0572424f9b6fe1561e15b070c1155b1b8f9272499029d425ff5a8d0e0aa8f40fL24) [[2]](diffhunk://#diff-a4327d67c48e4a4115a1ac9bc0a82a646bbfcb141d80f5f428142f55027e16a1R20-L21) [[3]](diffhunk://#diff-f46297d8957a9929f575febc300a004c144e106ea6893f1b95508ab006503407R18-L23) [[4]](diffhunk://#diff-3ef6a4f806adc33273c229fbdb827c072152651d5930b19affde4c1f8984c51cR20-L23) [[5]](diffhunk://#diff-52cdc310f4ed34081299dff53c543455745a834dbc5c50a2c21b765c0c90c3f8R18) [[6]](diffhunk://#diff-03a87568e2651d1524985a56a278f2e2932667c1e92efc60d0c5a750f0ad316bR20-R21) [[7]](diffhunk://#diff-23fa0193d626ba712c4186c66bcd1809c7e55bfc04ea10f5a91c691ed3e04727R21) [[8]](diffhunk://#diff-4dc7440cc992e7f9bdd8ec9c5bfc5a6194f9d78fc5ff359c4781d992df4e610bR20) [[9]](diffhunk://#diff-5eb6e846447db952b75ba0fd9bc1614702c428689c93e089a952ea414c23b7fdR20) [[10]](diffhunk://#diff-c33a6f975ebaa66163e68ba51a4d9ce0cbfd6b5d063edce503130d7bae502c53R20) [[11]](diffhunk://#diff-8061bb86d18c96049b63aa2caf4851933bff6b16cefa5460b1ee736d6f0ac883R27-R28) [[12]](diffhunk://#diff-9018eae3f9bef2cf64079552ce4d9c3fd3535a31b86a4ff496d29853c4968cb0R20) * Updated all function calls from `crc32c::Value(...)` to `crc32c::Crc32c(...)` for computing CRC32C checksums. [[1]](diffhunk://#diff-0572424f9b6fe1561e15b070c1155b1b8f9272499029d425ff5a8d0e0aa8f40fL120-R119) [[2]](diffhunk://#diff-a4327d67c48e4a4115a1ac9bc0a82a646bbfcb141d80f5f428142f55027e16a1L89-R90) [[3]](diffhunk://#diff-3ef6a4f806adc33273c229fbdb827c072152651d5930b19affde4c1f8984c51cL189-R190) [[4]](diffhunk://#diff-3ef6a4f806adc33273c229fbdb827c072152651d5930b19affde4c1f8984c51cL420-R421) [[5]](diffhunk://#diff-52cdc310f4ed34081299dff53c543455745a834dbc5c50a2c21b765c0c90c3f8L180-R180) [[6]](diffhunk://#diff-ea6232df0f48fea9e5403472da0bc4206acfd69b676c1b5fbc2d2df13df24624L149-R150) [[7]](diffhunk://#diff-5eb6e846447db952b75ba0fd9bc1614702c428689c93e089a952ea414c23b7fdL178-R181) [[8]](diffhunk://#diff-c33a6f975ebaa66163e68ba51a4d9ce0cbfd6b5d063edce503130d7bae502c53L472-R472) [[9]](diffhunk://#diff-8061bb86d18c96049b63aa2caf4851933bff6b16cefa5460b1ee736d6f0ac883L1158-R1159) * Updated all function calls from `crc32c::Extend(...)` to use the new function signature, casting data pointers to `const uint8_t*` as required by the upstream library. [[1]](diffhunk://#diff-f46297d8957a9929f575febc300a004c144e106ea6893f1b95508ab006503407L320-R321) [[2]](diffhunk://#diff-f46297d8957a9929f575febc300a004c144e106ea6893f1b95508ab006503407L369-R370) [[3]](diffhunk://#diff-ea6232df0f48fea9e5403472da0bc4206acfd69b676c1b5fbc2d2df13df24624L103-R104) [[4]](diffhunk://#diff-23fa0193d626ba712c4186c66bcd1809c7e55bfc04ea10f5a91c691ed3e04727L2037-R2037) [[5]](diffhunk://#diff-4dc7440cc992e7f9bdd8ec9c5bfc5a6194f9d78fc5ff359c4781d992df4e610bL734-R734) **Checksum Calculation Logic** * Modified checksum calculation for multi-slice data by iteratively using `crc32c::Extend` over each slice, ensuring correct cumulative checksum computation. * Updated checksum verification logic to use the new API and data types, improving reliability and consistency across modules. [[1]](diffhunk://#diff-0572424f9b6fe1561e15b070c1155b1b8f9272499029d425ff5a8d0e0aa8f40fL120-R119) [[2]](diffhunk://#diff-a4327d67c48e4a4115a1ac9bc0a82a646bbfcb141d80f5f428142f55027e16a1L89-R90) [[3]](diffhunk://#diff-3ef6a4f806adc33273c229fbdb827c072152651d5930b19affde4c1f8984c51cL189-R190) [[4]](diffhunk://#diff-3ef6a4f806adc33273c229fbdb827c072152651d5930b19affde4c1f8984c51cL420-R421) [[5]](diffhunk://#diff-52cdc310f4ed34081299dff53c543455745a834dbc5c50a2c21b765c0c90c3f8L180-R180) [[6]](diffhunk://#diff-ea6232df0f48fea9e5403472da0bc4206acfd69b676c1b5fbc2d2df13df24624L149-R150) [[7]](diffhunk://#diff-5eb6e846447db952b75ba0fd9bc1614702c428689c93e089a952ea414c23b7fdL178-R181) [[8]](diffhunk://#diff-c33a6f975ebaa66163e68ba51a4d9ce0cbfd6b5d063edce503130d7bae502c53L472-R472) [[9]](diffhunk://#diff-8061bb86d18c96049b63aa2caf4851933bff6b16cefa5460b1ee736d6f0ac883L1158-R1159) **Code Clean-up and Consistency** * Removed all redundant or obsolete includes of the custom `crc32c.h` header. [[1]](diffhunk://#diff-a4327d67c48e4a4115a1ac9bc0a82a646bbfcb141d80f5f428142f55027e16a1R20-L21) [[2]](diffhunk://#diff-f46297d8957a9929f575febc300a004c144e106ea6893f1b95508ab006503407R18-L23) [[3]](diffhunk://#diff-3ef6a4f806adc33273c229fbdb827c072152651d5930b19affde4c1f8984c51cR20-L23) [[4]](diffhunk://#diff-52cdc310f4ed34081299dff53c543455745a834dbc5c50a2c21b765c0c90c3f8L30) [[5]](diffhunk://#diff-03a87568e2651d1524985a56a278f2e2932667c1e92efc60d0c5a750f0ad316bL30) [[6]](diffhunk://#diff-23fa0193d626ba712c4186c66bcd1809c7e55bfc04ea10f5a91c691ed3e04727L51) [[7]](diffhunk://#diff-4dc7440cc992e7f9bdd8ec9c5bfc5a6194f9d78fc5ff359c4781d992df4e610bL49) [[8]](diffhunk://#diff-5eb6e846447db952b75ba0fd9bc1614702c428689c93e089a952ea414c23b7fdL44) [[9]](diffhunk://#diff-c33a6f975ebaa66163e68ba51a4d9ce0cbfd6b5d063edce503130d7bae502c53L69) [[10]](diffhunk://#diff-8061bb86d18c96049b63aa2caf4851933bff6b16cefa5460b1ee736d6f0ac883L65) * Ensured all modules that require CRC32C now directly depend on the upstream library, reducing maintenance overhead and potential for bugs. (all above references) These changes collectively improve code maintainability, reliability, and alignment with upstream best practices for CRC32C checksum operations.
What problem does this PR solve?
add Crc32CHashPartitioner

This pull request refactors the codebase to standardize the usage of the CRC32C checksum library by replacing the custom
util/crc32c.hheader and its functions with the upstreamcrc32clibrary (<crc32c/crc32c.h>) and its API. It also updates function calls to use the correct data types expected by the new library and ensures consistent checksum calculation across multiple modules related to file I/O, compression, and storage.Migration to Upstream CRC32C Library
"util/crc32c.h"with<crc32c/crc32c.h>and removed the custom header from all relevant files. [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12]crc32c::Value(...)tocrc32c::Crc32c(...)for computing CRC32C checksums. [1] [2] [3] [4] [5] [6] [7] [8] [9]crc32c::Extend(...)to use the new function signature, casting data pointers toconst uint8_t*as required by the upstream library. [1] [2] [3] [4] [5]Checksum Calculation Logic
crc32c::Extendover each slice, ensuring correct cumulative checksum computation.Code Clean-up and Consistency
crc32c.hheader. [1] [2] [3] [4] [5] [6] [7] [8] [9] [10]These changes collectively improve code maintainability, reliability, and alignment with upstream best practices for CRC32C checksum operations.
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)