Commit 8ec6539
[KYUUBI #7245] Fix arrow batch converter error
### Why are the changes needed?
Control the amount of data to prevent memory overflow and increase to initial speed.
When `kyuubi.operation.result.format=arrow`, `spark.connect.grpc.arrow.maxBatchSize` does not work as expected.
Reproduction:
You can debug `KyuubiArrowConverters` or add the following log to line 300 of `KyuubiArrowConverters`:
```
logInfo(s"Total limit: ${limit}, rowCount: ${rowCount}, " +
s"rowCountInLastBatch:${rowCountInLastBatch}," +
s"estimatedBatchSize: ${estimatedBatchSize}," +
s"maxEstimatedBatchSize: ${maxEstimatedBatchSize}," +
s"maxRecordsPerBatch:${maxRecordsPerBatch}")
```
Test data: 1.6 million rows, 30 columns per row. Command executed:
```
bin/beeline \
-u 'jdbc:hive2://10.168.X.X:XX/default;thrift.client.max.message.size=2000000000' \
--hiveconf kyuubi.operation.result.format=arrow \
-n test -p 'testpass' \
--outputformat=csv2 -e "select * from db.table" > /tmp/test.csv
```
Log output
```
25/11/13 13:52:57 INFO KyuubiArrowConverters: Total limit: -1, rowCount: 200000, lastBatchRowCount:200000, estimatedBatchSize: 145600000 maxEstimatedBatchSize: 4,maxRecordsPerBatch:10000
25/11/13 13:52:57 INFO KyuubiArrowConverters: Total limit: -1, rowCount: 200000, lastBatchRowCount:200000, estimatedBatchSize: 145600000
```
Original Code
```
while (rowIter.hasNext && (
rowCountInLastBatch == 0 && maxEstimatedBatchSize > 0 ||
estimatedBatchSize <= 0 ||
estimatedBatchSize < maxEstimatedBatchSize ||
maxRecordsPerBatch <= 0 ||
rowCountInLastBatch < maxRecordsPerBatch ||
rowCount < limit ||
limit < 0))
```
When the `limit` is not set, i.e., `-1`, all data will be retrieved at once. If the row count is too large, the following three problems will occur:
(1) Driver/executor oom
(2) Array oom cause of array length is not enough
(3) Transfer data slowly
After updating the code, the log output is as follows:
```
25/11/14 10:57:16 INFO KyuubiArrowConverters: Total limit: -1, rowCount: 5762, rowCountInLastBatch:5762, estimatedBatchSize: 4194736, maxEstimatedBatchSize: 4194304, maxRecordsPerBatch:10000
25/11/14 10:57:16 INFO KyuubiArrowConverters: Total limit: -1, rowCount: 11524, rowCountInLastBatch: 5762, estimatedBatchSize: 4194736, maxEstimatedBatchSize: 4194304, maxRecordsPerBatch: 10000
25/11/14 10:57:16 INFO KyuubiArrowConverters: Total limit: -1, rowCount: 17286, rowCountInLastBatch: 5762, estimatedBatchSize: 4194736, maxEstimatedBatchSize: 4194304, maxRecordsPerBatch: 10000
```
The estimatedBatchSize is slightly larger than the maxEstimatedBatchSize. Data can be written in batches as expected.
Fix #7245.
### How was this patch tested?
Test data: 1.6 million rows, 30 columns per row.
```
25/11/14 10:57:16 INFO KyuubiArrowConverters: Total limit: -1, rowCount: 5762, rowCountInLastBatch:5762, estimatedBatchSize: 4194736, maxEstimatedBatchSize: 4194304, maxRecordsPerBatch:10000
25/11/14 10:57:16 INFO KyuubiArrowConverters: Total limit: -1, rowCount: 11524, rowCountInLastBatch: 5762, estimatedBatchSize: 4194736, maxEstimatedBatchSize: 4194304, maxRecordsPerBatch: 10000
25/11/14 10:57:16 INFO KyuubiArrowConverters: Total limit: -1, rowCount: 17286, rowCountInLastBatch: 5762, estimatedBatchSize: 4194736, maxEstimatedBatchSize: 4194304, maxRecordsPerBatch: 10000
```
### Was this patch authored or co-authored using generative AI tooling?
No
Closes #7246 from echo567/fix-arrow-converter.
Closes #7245
6ef4ef1 [echo567] Merge branch 'master' into fix-arrow-converter
c9d0d18 [echo567] fix(arrow): repairing arrow based on spark
479d7e4 [echo567] fix(spark): fix arrow batch converter error
Authored-by: echo567 <[email protected]>
Signed-off-by: Cheng Pan <[email protected]>
(cherry picked from commit acdb6a3)
Signed-off-by: Cheng Pan <[email protected]>1 parent 3667026 commit 8ec6539
File tree
2 files changed
+23
-13
lines changed- externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/spark/sql
- execution/arrow
- kyuubi
2 files changed
+23
-13
lines changedLines changed: 20 additions & 11 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
274 | 274 | | |
275 | 275 | | |
276 | 276 | | |
| 277 | + | |
| 278 | + | |
| 279 | + | |
| 280 | + | |
| 281 | + | |
| 282 | + | |
| 283 | + | |
| 284 | + | |
| 285 | + | |
| 286 | + | |
| 287 | + | |
| 288 | + | |
| 289 | + | |
| 290 | + | |
277 | 291 | | |
278 | | - | |
279 | | - | |
| 292 | + | |
280 | 293 | | |
281 | | - | |
282 | | - | |
283 | | - | |
284 | | - | |
285 | | - | |
286 | | - | |
287 | | - | |
288 | | - | |
289 | | - | |
| 294 | + | |
| 295 | + | |
| 296 | + | |
| 297 | + | |
| 298 | + | |
290 | 299 | | |
291 | 300 | | |
292 | 301 | | |
| |||
Lines changed: 3 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
164 | 164 | | |
165 | 165 | | |
166 | 166 | | |
167 | | - | |
168 | | - | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
169 | 170 | | |
170 | 171 | | |
171 | 172 | | |
| |||
0 commit comments