Replies: 2 comments
-
note that this feature currently mostly helps in cloud environments. I'm working on other improvements when reading from local disk or node local HDFS type file systems. |
Beta Was this translation helpful? Give feedback.
0 replies
Answer selected by
sameerz
-
reopen if further questions |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
1,
spark.rapids.sql.format.parquet.multiThreadedRead.enabled
q1: what is
multiple small files within a partition
?q2: does there exist a size threshold to judge whether a file is a small file?
q3: whether this thread pool is async or sync with spark task? (i mean: Whether threads in this pool start reading only when a spark task demand a batch OR they can do file reading async to spark task's execution?)
q4: how much data will be read before threads in this pool will stop reading (a batch ? a row group ? or will read as much as possible until the buffer full)
2,
spark.rapids.sql.format.parquet.multiThreadedRead.numThreads
q5: the task pool and the number threads in this pool is per-executor or per-spark-task ?
Beta Was this translation helpful? Give feedback.
All reactions