feat: Improve fetch partition performance, support skip validation arrow ipc files#1216
feat: Improve fetch partition performance, support skip validation arrow ipc files#1216westhide wants to merge 2 commits intoapache:mainfrom
Conversation
|
Thanks @westhide this PR makes sense, It also brings configurable validation option, which I did not really think about.
|
Q1: Q2: AdditionShould we consider the power down scene? It may cause Arrow ipc file broken if |
I guess we can safely assume that only shuffle files are accessed, not a random files.
what I was having in mind is enabling unsafe (without validation) by default but having a executor configuration switch which could revert this. It might be easy to cover case with arrow flight but a bit harder when ShuffleReader reads local shuffle file
Im not sure this case can be achieved with this validation? how can we be sure that file is written fully? I guess we'd need some kind of checksum for this scenario. |
All right, will move the validation config to Executor. And will try to add a checksum validation with failed Job rerun after finish the wasm udf. |
eeaab58 to
972dbd0
Compare
|
hey @westhide are you still interested to get this PR merged? |
|
moving to draft as its waiting for changes |
|
As far as I can see, we don't have to validate the IPC files:
I think it makes sense to support job recovery either by stage or by completed task rather than per IPC file. |
|
Is there anything left to do apart from the config based setup on the executor for this PR? |
|
yes we can add it, please do @killzoner |
|
maybe as a follow up, #1316 (comment) point 1, 2 would complement this work |
That's clear, thank you |
|
closing this as other PR addressed the issue |
Which issue does this PR close?
Closes #1189.
Rationale for this change
What changes are included in this PR?
Are there any user-facing changes?