Commit e459279
authored
[Chore](hash) use google/crc32c to instead rocksdb/crc32c and crc_hash (#58557)
doris have crc32c from rocksdb now, but it has poorly performance than
google/crc32c.
66663538 rows int
crc32c-rocksdb 684.879ms
crc32c-google 206.360ms
66663538 rows varchar
crc32c-rocksdb 1sec368ms
crc32c-google 391.290ms
We already have unit tests for
rocksdb/crc32c([be/test/util/crc32c_test.cpp](https://github.com/apache/doris/blob/master/be/test/util/crc32c_test.cpp)),
so this change is safe
This pull request updates the codebase to use the more efficient and
modern CRC32C hashing algorithm in place of the older CRC32
implementation. The changes include switching hash functions throughout
the code, updating the CRC32C utility implementation to use an external
library, and adding the required third-party dependency. This improves
hash performance and consistency, and prepares the codebase for future
compatibility.
**Hashing algorithm migration:**
* Replaced all usages of `HashUtil::crc_hash` with
`HashUtil::crc32c_hash` in `block_bloom_filter.hpp`,
`column_dictionary.h`, and `function_string.h` to utilize CRC32C for
better performance and reliability.
[[1]](diffhunk://#diff-635476edd1321096d1d32eb6453bed4624e8f23d0580750d515aaad9dfe5404eL79-R79)
[[2]](diffhunk://#diff-635476edd1321096d1d32eb6453bed4624e8f23d0580750d515aaad9dfe5404eL108-R108)
[[3]](diffhunk://#diff-bf8bb38b6a6eae6cccd7ed62ff64b1a77fbd273a614348b096330abea8331b4dL348-R348)
[[4]](diffhunk://#diff-9cc694af32a330f9ffd947df039bdfc12be67b2107c9e612d7861b17c5018176L4601-R4601)
* Added the new `crc32c_hash` method to `HashUtil` and marked the old
`crc_hash` as deprecated, retaining it only for backward compatibility
with historical data.
[[1]](diffhunk://#diff-92d951e58f5e0b824254f5eb0d931b604518e4bfbe666b665cd56ed9435667bbL52-R58)
[[2]](diffhunk://#diff-92d951e58f5e0b824254f5eb0d931b604518e4bfbe666b665cd56ed9435667bbR68-R69)
[[3]](diffhunk://#diff-92d951e58f5e0b824254f5eb0d931b604518e4bfbe666b665cd56ed9435667bbL120-L124)
**CRC32C utility refactor and dependency management:**
* Refactored `crc32c.cpp` and `crc32c.h` to use the external `crc32c`
library, removing the previous custom implementation and lookup tables.
Added new utility functions for CRC32C operations.
[[1]](diffhunk://#diff-1a21d70259827997bdfd54da21acd6db2ae0a29465873b53dbf8c7e9c6a7e265L18-R38)
[[2]](diffhunk://#diff-72d5c6ec3fe2da095fe1413472778c1d56027242035bdb83c62339ccfcca6ed6L18-R33)
* Added the `crc32c` third-party dependency in the build configuration
to support the new CRC32C utility.
**Build and header updates:**
* Updated includes in `hash_util.hpp` to reference the new CRC32C
utility.1 parent 3e785ee commit e459279
File tree
7 files changed
+28
-273
lines changed- be
- cmake
- src
- exprs
- util
- vec
- columns
- functions
7 files changed
+28
-273
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
68 | 68 | | |
69 | 69 | | |
70 | 70 | | |
| 71 | + | |
71 | 72 | | |
72 | 73 | | |
73 | 74 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
76 | 76 | | |
77 | 77 | | |
78 | 78 | | |
79 | | - | |
| 79 | + | |
80 | 80 | | |
81 | 81 | | |
82 | 82 | | |
| |||
105 | 105 | | |
106 | 106 | | |
107 | 107 | | |
108 | | - | |
| 108 | + | |
109 | 109 | | |
110 | 110 | | |
111 | 111 | | |
| |||
Large diffs are not rendered by default.
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
15 | 15 | | |
16 | 16 | | |
17 | 17 | | |
18 | | - | |
19 | | - | |
20 | | - | |
21 | 18 | | |
22 | 19 | | |
23 | | - | |
24 | | - | |
25 | | - | |
26 | 20 | | |
27 | 21 | | |
28 | 22 | | |
29 | 23 | | |
30 | 24 | | |
31 | 25 | | |
32 | 26 | | |
33 | | - | |
34 | | - | |
35 | | - | |
36 | | - | |
| 27 | + | |
37 | 28 | | |
38 | 29 | | |
39 | | - | |
40 | | - | |
41 | | - | |
| 30 | + | |
42 | 31 | | |
43 | 32 | | |
44 | | - | |
45 | | - | |
46 | | - | |
47 | | - | |
48 | | - | |
49 | | - | |
50 | | - | |
| 33 | + | |
51 | 34 | | |
52 | 35 | | |
53 | 36 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
30 | 30 | | |
31 | 31 | | |
32 | 32 | | |
| 33 | + | |
33 | 34 | | |
34 | 35 | | |
35 | 36 | | |
| |||
49 | 50 | | |
50 | 51 | | |
51 | 52 | | |
52 | | - | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
53 | 59 | | |
54 | 60 | | |
55 | 61 | | |
| |||
59 | 65 | | |
60 | 66 | | |
61 | 67 | | |
| 68 | + | |
| 69 | + | |
62 | 70 | | |
63 | 71 | | |
64 | 72 | | |
| |||
117 | 125 | | |
118 | 126 | | |
119 | 127 | | |
120 | | - | |
121 | | - | |
122 | | - | |
123 | | - | |
124 | | - | |
125 | 128 | | |
126 | 129 | | |
127 | 130 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
345 | 345 | | |
346 | 346 | | |
347 | 347 | | |
348 | | - | |
| 348 | + | |
349 | 349 | | |
350 | 350 | | |
351 | 351 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
4598 | 4598 | | |
4599 | 4599 | | |
4600 | 4600 | | |
4601 | | - | |
| 4601 | + | |
4602 | 4602 | | |
4603 | 4603 | | |
4604 | 4604 | | |
| |||
0 commit comments