Replies: 3 comments
-
Thanks for the report @alephonea. You mentioned a CUDF example, mind sharing that one as well? Also, if you wouldn't mind doing a |
Beta Was this translation helpful? Give feedback.
-
For the cuDF case https://gist.github.com/alephonea/85c455918e6930e1f65ca55ad8d912de, I'd like to know how you are launching cuDF itself (RMM pool size, if any). For the Spark-RAPIDS case, did you measure a single run? Or are you measuring several runs and averaging the runtime? Spark, using the JVM, will require time to JIT compile code and that's something that will hit pretty hard the first (cold) iteration. Spark and Spark-RAPIDS are set for scale-out. They are not optimized for the use case described here. If you have a bigger dataset (we usually run 1TB+), where we are running with ~16 threads, all loading from filesystem, that's the type of workload that Spark and Spark-RAPIDS is more optimized for. |
Beta Was this translation helpful? Give feedback.
-
What is the HEAD commit hash did you build it from? PerfIO is only for S3 and is disabled by default. If the input path is indeed a literal Do you observe a difference in your measurements after rebuilding with this patch? diff --git a/sql-plugin/src/main/scala/com/nvidia/spark/rapids/parquet/GpuParquetScan.scala b/sql-plugin/src/main/scala/com/nvidia/spark/rapids/parquet/GpuParquetScan.scala
index da80757e74..c4f0d70c0f 100644
--- a/sql-plugin/src/main/scala/com/nvidia/spark/rapids/parquet/GpuParquetScan.scala
+++ b/sql-plugin/src/main/scala/com/nvidia/spark/rapids/parquet/GpuParquetScan.scala
@@ -551,8 +551,7 @@ private case class GpuParquetFileFilterHandler(
private def readFooterBuffer(
filePath: Path,
conf: Configuration): HostMemoryBuffer = {
- PerfIO.readParquetFooterBuffer(filePath, conf, verifyParquetMagic)
- .getOrElse(readFooterBufUsingHadoop(filePath, conf))
+ readFooterBufUsingHadoop(filePath, conf)
}
private def readFooterBufUsingHadoop(filePath: Path, conf: Configuration): HostMemoryBuffer = {
@@ -1869,10 +1868,7 @@ trait ParquetPartitionReaderBase extends Logging with ScanWithMetrics
val coalescedRanges = coalesceReads(remoteCopies)
- val totalBytesCopied = PerfIO.readToHostMemory(
- conf, out.buffer, filePath.toUri,
- coalescedRanges.map(r => IntRangeWithOffset(r.offset, r.length, r.outputOffset))
- ).getOrElse {
+ val totalBytesCopied = {
withResource(filePath.getFileSystem(conf).open(filePath)) { in =>
val copyBuffer: Array[Byte] = new Array[Byte](copyBufferSize)
coalescedRanges.foldLeft(0L) { (acc, blockCopy) => |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hello,
I tested spark-rapids against synthetic TPCH dataset, and noticed that the performance isn't great. Specifically, query time is
about 10 times higher compared to using CUDF directly on the same data.
Looking at the execution graph and logs, it seems that loading data from parquet files is an issue. Compared to using CUDF loader, reading parquet files, and decoding GPU buffers takes 10 times longer.
Input parquet files are fully cached in OS page cache.
I've instrumented code with some log lines, and found that decoding GPU representation is performed by ai.rapids.cudf.ParquetChunkReader. Calls to this class take about 2 seconds to copy and decode data necessary to TPCH query 1 (SF=20), compared to about 200ms for the CUDF loader.
Also, by instrumenting code I found that reading files is done by PerfIO.readToHostMemory(). All call to this library together take about 2 seconds to read 1.2GB of data - this is 10 times slower than expected when reading file data from OS page cache.
Do you know what could be the reason for such performance effects?
Here's the setup:
spark version 3.5.6
plugin is built from
branch-25.08
using the following command:Then,
dist/target/rapids-4-spark_2.12-25.08.0-SNAPSHOT-cuda12.jar
is provided to pyspark session that does the following:Configuration:
1xNVIDIA H200
44-core AMD EPYC 9654
178GB RAM
openjdk 17
scala 12
Beta Was this translation helpful? Give feedback.
All reactions