Commit 779bc05
authored
[Improvement](Iceberg) Optimize LocationPath.of performance for Iceberg table queries (#59217)
### What problem does this PR solve?
### Problem
When querying Iceberg tables with large number of data files,
`LocationPath.of` method consumes significant CPU time. The main
bottlenecks are:
1. Repeated regex parsing in `S3URI.create()` for each file path
2. Multiple `String.split()` calls for scheme extraction
3. Repeated `StorageProperties` lookup from map for each file
### Solution
This PR introduces several optimizations to reduce CPU overhead:
#### 1. Optimize scheme parsing in `LocationPath.parseScheme`
- Replace `String.split("://")` with `indexOf` + `substring` to avoid
array allocation
#### 2. Add fast path for S3-compatible schemes in
`S3PropertyUtils.validateAndNormalizeUri`
- For simple S3-compatible URIs like `oss://bucket/key`,
`s3a://bucket/key`, use direct string replacement instead of full S3URI
regex parsing
- Only fall back to full S3URI parsing for complex HTTP URLs
#### 3. Add path prefix caching in `IcebergScanNode`
- Cache `StorageProperties`, schema, and path prefix mapping on first
file
- For subsequent files with the same prefix, directly transform paths
using string replacement
- New `LocationPath.ofDirect()` method to create LocationPath without
any parsing1 parent 2458657 commit 779bc05
File tree
5 files changed
+260
-19
lines changed- fe/fe-core/src
- main/java/org/apache/doris
- common/util
- datasource
- iceberg/source
- property/storage
- test/java/org/apache/doris
- common/util
- datasource/property/storage
5 files changed
+260
-19
lines changedLines changed: 80 additions & 16 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
96 | 96 | | |
97 | 97 | | |
98 | 98 | | |
99 | | - | |
100 | | - | |
101 | | - | |
102 | | - | |
103 | | - | |
104 | | - | |
105 | | - | |
106 | | - | |
107 | | - | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
108 | 108 | | |
109 | 109 | | |
110 | 110 | | |
111 | | - | |
112 | | - | |
113 | | - | |
114 | | - | |
115 | | - | |
116 | | - | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
117 | 115 | | |
118 | 116 | | |
119 | | - | |
| 117 | + | |
120 | 118 | | |
121 | 119 | | |
122 | 120 | | |
| |||
201 | 199 | | |
202 | 200 | | |
203 | 201 | | |
| 202 | + | |
| 203 | + | |
| 204 | + | |
| 205 | + | |
| 206 | + | |
| 207 | + | |
| 208 | + | |
| 209 | + | |
| 210 | + | |
| 211 | + | |
| 212 | + | |
| 213 | + | |
| 214 | + | |
| 215 | + | |
| 216 | + | |
| 217 | + | |
| 218 | + | |
| 219 | + | |
| 220 | + | |
| 221 | + | |
| 222 | + | |
| 223 | + | |
| 224 | + | |
| 225 | + | |
| 226 | + | |
| 227 | + | |
| 228 | + | |
| 229 | + | |
| 230 | + | |
| 231 | + | |
| 232 | + | |
| 233 | + | |
| 234 | + | |
| 235 | + | |
| 236 | + | |
| 237 | + | |
| 238 | + | |
| 239 | + | |
| 240 | + | |
| 241 | + | |
| 242 | + | |
| 243 | + | |
| 244 | + | |
| 245 | + | |
| 246 | + | |
| 247 | + | |
| 248 | + | |
| 249 | + | |
| 250 | + | |
| 251 | + | |
| 252 | + | |
| 253 | + | |
| 254 | + | |
| 255 | + | |
| 256 | + | |
| 257 | + | |
| 258 | + | |
| 259 | + | |
| 260 | + | |
| 261 | + | |
| 262 | + | |
| 263 | + | |
| 264 | + | |
| 265 | + | |
| 266 | + | |
| 267 | + | |
204 | 268 | | |
205 | 269 | | |
206 | 270 | | |
| |||
Lines changed: 78 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
135 | 135 | | |
136 | 136 | | |
137 | 137 | | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
138 | 150 | | |
139 | 151 | | |
140 | 152 | | |
| |||
511 | 523 | | |
512 | 524 | | |
513 | 525 | | |
| 526 | + | |
| 527 | + | |
| 528 | + | |
| 529 | + | |
| 530 | + | |
| 531 | + | |
| 532 | + | |
| 533 | + | |
| 534 | + | |
| 535 | + | |
| 536 | + | |
| 537 | + | |
| 538 | + | |
| 539 | + | |
| 540 | + | |
| 541 | + | |
| 542 | + | |
| 543 | + | |
| 544 | + | |
| 545 | + | |
| 546 | + | |
| 547 | + | |
| 548 | + | |
| 549 | + | |
| 550 | + | |
| 551 | + | |
| 552 | + | |
| 553 | + | |
| 554 | + | |
| 555 | + | |
| 556 | + | |
| 557 | + | |
| 558 | + | |
| 559 | + | |
| 560 | + | |
| 561 | + | |
| 562 | + | |
| 563 | + | |
| 564 | + | |
| 565 | + | |
| 566 | + | |
| 567 | + | |
| 568 | + | |
| 569 | + | |
| 570 | + | |
| 571 | + | |
| 572 | + | |
| 573 | + | |
| 574 | + | |
| 575 | + | |
| 576 | + | |
| 577 | + | |
| 578 | + | |
| 579 | + | |
| 580 | + | |
| 581 | + | |
| 582 | + | |
| 583 | + | |
| 584 | + | |
| 585 | + | |
| 586 | + | |
| 587 | + | |
| 588 | + | |
| 589 | + | |
| 590 | + | |
514 | 591 | | |
515 | 592 | | |
516 | | - | |
| 593 | + | |
517 | 594 | | |
518 | 595 | | |
519 | 596 | | |
| |||
Lines changed: 50 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
33 | 33 | | |
34 | 34 | | |
35 | 35 | | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
36 | 45 | | |
37 | 46 | | |
38 | 47 | | |
| |||
113 | 122 | | |
114 | 123 | | |
115 | 124 | | |
116 | | - | |
| 125 | + | |
| 126 | + | |
117 | 127 | | |
118 | 128 | | |
119 | 129 | | |
| |||
132 | 142 | | |
133 | 143 | | |
134 | 144 | | |
135 | | - | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
136 | 148 | | |
137 | 149 | | |
138 | 150 | | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
139 | 159 | | |
140 | 160 | | |
141 | 161 | | |
142 | 162 | | |
143 | 163 | | |
144 | 164 | | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
| 186 | + | |
| 187 | + | |
| 188 | + | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
145 | 193 | | |
146 | 194 | | |
147 | 195 | | |
| |||
Lines changed: 40 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
19 | 19 | | |
20 | 20 | | |
21 | 21 | | |
| 22 | + | |
22 | 23 | | |
23 | 24 | | |
24 | 25 | | |
| |||
300 | 301 | | |
301 | 302 | | |
302 | 303 | | |
| 304 | + | |
| 305 | + | |
| 306 | + | |
| 307 | + | |
| 308 | + | |
| 309 | + | |
| 310 | + | |
| 311 | + | |
| 312 | + | |
| 313 | + | |
| 314 | + | |
| 315 | + | |
| 316 | + | |
| 317 | + | |
| 318 | + | |
| 319 | + | |
| 320 | + | |
| 321 | + | |
| 322 | + | |
| 323 | + | |
| 324 | + | |
| 325 | + | |
| 326 | + | |
| 327 | + | |
| 328 | + | |
| 329 | + | |
| 330 | + | |
| 331 | + | |
| 332 | + | |
| 333 | + | |
| 334 | + | |
| 335 | + | |
| 336 | + | |
| 337 | + | |
| 338 | + | |
| 339 | + | |
| 340 | + | |
| 341 | + | |
| 342 | + | |
303 | 343 | | |
Lines changed: 12 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
108 | 108 | | |
109 | 109 | | |
110 | 110 | | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
111 | 123 | | |
0 commit comments