Commit 72a6c12
authored
[Improvement](shuffle) add Crc32CHashPartitioner (#59052)
add Crc32CHashPartitioner
<img width="596" height="4284" alt="图片"
src="https://github.com/user-attachments/assets/5773ea04-b01a-4c8c-ba5a-0c725cb11f11"
/>
This pull request refactors the codebase to standardize the usage of the
CRC32C checksum library by replacing the custom `util/crc32c.h` header
and its functions with the upstream `crc32c` library
(`<crc32c/crc32c.h>`) and its API. It also updates function calls to use
the correct data types expected by the new library and ensures
consistent checksum calculation across multiple modules related to file
I/O, compression, and storage.
**Migration to Upstream CRC32C Library**
* Replaced all includes of `"util/crc32c.h"` with `<crc32c/crc32c.h>`
and removed the custom header from all relevant files.
[[1]](diffhunk://#diff-0572424f9b6fe1561e15b070c1155b1b8f9272499029d425ff5a8d0e0aa8f40fL24)
[[2]](diffhunk://#diff-a4327d67c48e4a4115a1ac9bc0a82a646bbfcb141d80f5f428142f55027e16a1R20-L21)
[[3]](diffhunk://#diff-f46297d8957a9929f575febc300a004c144e106ea6893f1b95508ab006503407R18-L23)
[[4]](diffhunk://#diff-3ef6a4f806adc33273c229fbdb827c072152651d5930b19affde4c1f8984c51cR20-L23)
[[5]](diffhunk://#diff-52cdc310f4ed34081299dff53c543455745a834dbc5c50a2c21b765c0c90c3f8R18)
[[6]](diffhunk://#diff-03a87568e2651d1524985a56a278f2e2932667c1e92efc60d0c5a750f0ad316bR20-R21)
[[7]](diffhunk://#diff-23fa0193d626ba712c4186c66bcd1809c7e55bfc04ea10f5a91c691ed3e04727R21)
[[8]](diffhunk://#diff-4dc7440cc992e7f9bdd8ec9c5bfc5a6194f9d78fc5ff359c4781d992df4e610bR20)
[[9]](diffhunk://#diff-5eb6e846447db952b75ba0fd9bc1614702c428689c93e089a952ea414c23b7fdR20)
[[10]](diffhunk://#diff-c33a6f975ebaa66163e68ba51a4d9ce0cbfd6b5d063edce503130d7bae502c53R20)
[[11]](diffhunk://#diff-8061bb86d18c96049b63aa2caf4851933bff6b16cefa5460b1ee736d6f0ac883R27-R28)
[[12]](diffhunk://#diff-9018eae3f9bef2cf64079552ce4d9c3fd3535a31b86a4ff496d29853c4968cb0R20)
* Updated all function calls from `crc32c::Value(...)` to
`crc32c::Crc32c(...)` for computing CRC32C checksums.
[[1]](diffhunk://#diff-0572424f9b6fe1561e15b070c1155b1b8f9272499029d425ff5a8d0e0aa8f40fL120-R119)
[[2]](diffhunk://#diff-a4327d67c48e4a4115a1ac9bc0a82a646bbfcb141d80f5f428142f55027e16a1L89-R90)
[[3]](diffhunk://#diff-3ef6a4f806adc33273c229fbdb827c072152651d5930b19affde4c1f8984c51cL189-R190)
[[4]](diffhunk://#diff-3ef6a4f806adc33273c229fbdb827c072152651d5930b19affde4c1f8984c51cL420-R421)
[[5]](diffhunk://#diff-52cdc310f4ed34081299dff53c543455745a834dbc5c50a2c21b765c0c90c3f8L180-R180)
[[6]](diffhunk://#diff-ea6232df0f48fea9e5403472da0bc4206acfd69b676c1b5fbc2d2df13df24624L149-R150)
[[7]](diffhunk://#diff-5eb6e846447db952b75ba0fd9bc1614702c428689c93e089a952ea414c23b7fdL178-R181)
[[8]](diffhunk://#diff-c33a6f975ebaa66163e68ba51a4d9ce0cbfd6b5d063edce503130d7bae502c53L472-R472)
[[9]](diffhunk://#diff-8061bb86d18c96049b63aa2caf4851933bff6b16cefa5460b1ee736d6f0ac883L1158-R1159)
* Updated all function calls from `crc32c::Extend(...)` to use the new
function signature, casting data pointers to `const uint8_t*` as
required by the upstream library.
[[1]](diffhunk://#diff-f46297d8957a9929f575febc300a004c144e106ea6893f1b95508ab006503407L320-R321)
[[2]](diffhunk://#diff-f46297d8957a9929f575febc300a004c144e106ea6893f1b95508ab006503407L369-R370)
[[3]](diffhunk://#diff-ea6232df0f48fea9e5403472da0bc4206acfd69b676c1b5fbc2d2df13df24624L103-R104)
[[4]](diffhunk://#diff-23fa0193d626ba712c4186c66bcd1809c7e55bfc04ea10f5a91c691ed3e04727L2037-R2037)
[[5]](diffhunk://#diff-4dc7440cc992e7f9bdd8ec9c5bfc5a6194f9d78fc5ff359c4781d992df4e610bL734-R734)
**Checksum Calculation Logic**
* Modified checksum calculation for multi-slice data by iteratively
using `crc32c::Extend` over each slice, ensuring correct cumulative
checksum computation.
* Updated checksum verification logic to use the new API and data types,
improving reliability and consistency across modules.
[[1]](diffhunk://#diff-0572424f9b6fe1561e15b070c1155b1b8f9272499029d425ff5a8d0e0aa8f40fL120-R119)
[[2]](diffhunk://#diff-a4327d67c48e4a4115a1ac9bc0a82a646bbfcb141d80f5f428142f55027e16a1L89-R90)
[[3]](diffhunk://#diff-3ef6a4f806adc33273c229fbdb827c072152651d5930b19affde4c1f8984c51cL189-R190)
[[4]](diffhunk://#diff-3ef6a4f806adc33273c229fbdb827c072152651d5930b19affde4c1f8984c51cL420-R421)
[[5]](diffhunk://#diff-52cdc310f4ed34081299dff53c543455745a834dbc5c50a2c21b765c0c90c3f8L180-R180)
[[6]](diffhunk://#diff-ea6232df0f48fea9e5403472da0bc4206acfd69b676c1b5fbc2d2df13df24624L149-R150)
[[7]](diffhunk://#diff-5eb6e846447db952b75ba0fd9bc1614702c428689c93e089a952ea414c23b7fdL178-R181)
[[8]](diffhunk://#diff-c33a6f975ebaa66163e68ba51a4d9ce0cbfd6b5d063edce503130d7bae502c53L472-R472)
[[9]](diffhunk://#diff-8061bb86d18c96049b63aa2caf4851933bff6b16cefa5460b1ee736d6f0ac883L1158-R1159)
**Code Clean-up and Consistency**
* Removed all redundant or obsolete includes of the custom `crc32c.h`
header.
[[1]](diffhunk://#diff-a4327d67c48e4a4115a1ac9bc0a82a646bbfcb141d80f5f428142f55027e16a1R20-L21)
[[2]](diffhunk://#diff-f46297d8957a9929f575febc300a004c144e106ea6893f1b95508ab006503407R18-L23)
[[3]](diffhunk://#diff-3ef6a4f806adc33273c229fbdb827c072152651d5930b19affde4c1f8984c51cR20-L23)
[[4]](diffhunk://#diff-52cdc310f4ed34081299dff53c543455745a834dbc5c50a2c21b765c0c90c3f8L30)
[[5]](diffhunk://#diff-03a87568e2651d1524985a56a278f2e2932667c1e92efc60d0c5a750f0ad316bL30)
[[6]](diffhunk://#diff-23fa0193d626ba712c4186c66bcd1809c7e55bfc04ea10f5a91c691ed3e04727L51)
[[7]](diffhunk://#diff-4dc7440cc992e7f9bdd8ec9c5bfc5a6194f9d78fc5ff359c4781d992df4e610bL49)
[[8]](diffhunk://#diff-5eb6e846447db952b75ba0fd9bc1614702c428689c93e089a952ea414c23b7fdL44)
[[9]](diffhunk://#diff-c33a6f975ebaa66163e68ba51a4d9ce0cbfd6b5d063edce503130d7bae502c53L69)
[[10]](diffhunk://#diff-8061bb86d18c96049b63aa2caf4851933bff6b16cefa5460b1ee736d6f0ac883L65)
* Ensured all modules that require CRC32C now directly depend on the
upstream library, reducing maintenance overhead and potential for bugs.
(all above references)
These changes collectively improve code maintainability, reliability,
and alignment with upstream best practices for CRC32C checksum
operations.1 parent 235d8d9 commit 72a6c12
File tree
54 files changed
+550
-194
lines changed- be
- src
- cloud
- exec
- exprs
- io
- cache
- fs
- olap
- rowset
- segment_v2
- wal
- pipeline
- exec
- local_exchange
- tools
- util
- vec
- columns
- functions
- runtime
- test
- olap/rowset/segment_v2
- util
- fe/fe-core/src/main/java/org/apache/doris/qe
- gensrc/thrift
Some content is hidden
Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
54 files changed
+550
-194
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
21 | 21 | | |
22 | 22 | | |
23 | 23 | | |
24 | | - | |
25 | 24 | | |
26 | 25 | | |
27 | 26 | | |
| |||
117 | 116 | | |
118 | 117 | | |
119 | 118 | | |
120 | | - | |
| 119 | + | |
121 | 120 | | |
122 | 121 | | |
123 | 122 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
17 | 17 | | |
18 | 18 | | |
19 | 19 | | |
| 20 | + | |
| 21 | + | |
20 | 22 | | |
21 | | - | |
22 | 23 | | |
23 | 24 | | |
24 | 25 | | |
| |||
86 | 87 | | |
87 | 88 | | |
88 | 89 | | |
89 | | - | |
| 90 | + | |
90 | 91 | | |
91 | 92 | | |
92 | 93 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
15 | 15 | | |
16 | 16 | | |
17 | 17 | | |
| 18 | + | |
| 19 | + | |
18 | 20 | | |
19 | 21 | | |
20 | 22 | | |
21 | 23 | | |
22 | 24 | | |
23 | | - | |
24 | 25 | | |
25 | 26 | | |
26 | 27 | | |
| |||
317 | 318 | | |
318 | 319 | | |
319 | 320 | | |
320 | | - | |
| 321 | + | |
321 | 322 | | |
322 | 323 | | |
323 | 324 | | |
| |||
366 | 367 | | |
367 | 368 | | |
368 | 369 | | |
369 | | - | |
| 370 | + | |
370 | 371 | | |
371 | 372 | | |
372 | 373 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
20 | 20 | | |
21 | 21 | | |
22 | 22 | | |
| 23 | + | |
| 24 | + | |
23 | 25 | | |
24 | 26 | | |
25 | 27 | | |
| |||
28 | 30 | | |
29 | 31 | | |
30 | 32 | | |
31 | | - | |
32 | 33 | | |
33 | 34 | | |
34 | 35 | | |
| |||
76 | 77 | | |
77 | 78 | | |
78 | 79 | | |
79 | | - | |
| 80 | + | |
80 | 81 | | |
81 | 82 | | |
82 | 83 | | |
| |||
105 | 106 | | |
106 | 107 | | |
107 | 108 | | |
108 | | - | |
| 109 | + | |
109 | 110 | | |
110 | 111 | | |
111 | 112 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
17 | 17 | | |
18 | 18 | | |
19 | 19 | | |
| 20 | + | |
| 21 | + | |
20 | 22 | | |
21 | 23 | | |
22 | 24 | | |
23 | | - | |
24 | 25 | | |
25 | 26 | | |
26 | 27 | | |
| |||
186 | 187 | | |
187 | 188 | | |
188 | 189 | | |
189 | | - | |
| 190 | + | |
190 | 191 | | |
191 | 192 | | |
192 | 193 | | |
| |||
417 | 418 | | |
418 | 419 | | |
419 | 420 | | |
420 | | - | |
| 421 | + | |
421 | 422 | | |
422 | 423 | | |
423 | 424 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
15 | 15 | | |
16 | 16 | | |
17 | 17 | | |
| 18 | + | |
18 | 19 | | |
19 | 20 | | |
20 | 21 | | |
| |||
27 | 28 | | |
28 | 29 | | |
29 | 30 | | |
30 | | - | |
31 | 31 | | |
32 | 32 | | |
33 | 33 | | |
| |||
177 | 177 | | |
178 | 178 | | |
179 | 179 | | |
180 | | - | |
| 180 | + | |
181 | 181 | | |
182 | 182 | | |
183 | 183 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
18 | 18 | | |
19 | 19 | | |
20 | 20 | | |
| 21 | + | |
21 | 22 | | |
22 | 23 | | |
23 | 24 | | |
| |||
100 | 101 | | |
101 | 102 | | |
102 | 103 | | |
103 | | - | |
| 104 | + | |
104 | 105 | | |
105 | 106 | | |
106 | 107 | | |
| |||
146 | 147 | | |
147 | 148 | | |
148 | 149 | | |
149 | | - | |
| 150 | + | |
150 | 151 | | |
151 | 152 | | |
152 | 153 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
17 | 17 | | |
18 | 18 | | |
19 | 19 | | |
| 20 | + | |
| 21 | + | |
20 | 22 | | |
21 | 23 | | |
22 | 24 | | |
| |||
27 | 29 | | |
28 | 30 | | |
29 | 31 | | |
30 | | - | |
31 | 32 | | |
32 | 33 | | |
33 | 34 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
18 | 18 | | |
19 | 19 | | |
20 | 20 | | |
| 21 | + | |
21 | 22 | | |
22 | 23 | | |
23 | 24 | | |
| |||
48 | 49 | | |
49 | 50 | | |
50 | 51 | | |
51 | | - | |
52 | 52 | | |
53 | 53 | | |
54 | 54 | | |
| |||
2024 | 2024 | | |
2025 | 2025 | | |
2026 | 2026 | | |
2027 | | - | |
| 2027 | + | |
2028 | 2028 | | |
2029 | 2029 | | |
2030 | 2030 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
17 | 17 | | |
18 | 18 | | |
19 | 19 | | |
| 20 | + | |
20 | 21 | | |
21 | 22 | | |
22 | 23 | | |
| |||
46 | 47 | | |
47 | 48 | | |
48 | 49 | | |
49 | | - | |
50 | 50 | | |
51 | 51 | | |
52 | 52 | | |
| |||
731 | 731 | | |
732 | 732 | | |
733 | 733 | | |
734 | | - | |
| 734 | + | |
735 | 735 | | |
736 | 736 | | |
737 | 737 | | |
| |||
0 commit comments