@@ -35,3 +35,79 @@ converted into Arrow format, allowing native execution to happen after that.
3535
3636Comet does not provide native JSON scan, but when ` spark.comet.convert.json.enabled ` is enabled, data is immediately
3737converted into Arrow format, allowing native execution to happen after that.
38+
39+ # Supported Storages
40+
41+ ## Local
42+ In progress
43+
44+ ## HDFS
45+
46+ Apache DataFusion Comet native reader seamlessly scans files from remote HDFS for [ supported formats] ( #supported-spark-data-sources )
47+
48+ ### Using experimental native DataFusion reader
49+ Unlike to native Comet reader the Datafusion reader fully supports nested types processing. This reader is currently experimental only
50+
51+ To build Comet with native DataFusion reader and remote HDFS support it is required to have a JDK installed
52+
53+ Example:
54+ Build a Comet for ` spark-3.4 ` provide a JDK path in ` JAVA_HOME `
55+ Provide the JRE linker path in ` RUSTFLAGS ` , the path can vary depending on the system. Typically JRE linker is a part of installed JDK
56+
57+ ``` shell
58+ export JAVA_HOME=" /opt/homebrew/opt/openjdk@11"
59+ make release PROFILES=" -Pspark-3.4" COMET_FEATURES=hdfs RUSTFLAGS=" -L $JAVA_HOME /libexec/openjdk.jdk/Contents/Home/lib/server"
60+ ```
61+
62+ Start Comet with experimental reader and HDFS support as [ described] ( installation.md/#run-spark-shell-with-comet-enabled )
63+ and add additional parameters
64+
65+ ``` shell
66+ --conf spark.comet.scan.impl=native_datafusion \
67+ --conf spark.hadoop.fs.defaultFS=" hdfs://namenode:9000" \
68+ --conf spark.hadoop.dfs.client.use.datanode.hostname = true \
69+ --conf dfs.client.use.datanode.hostname = true
70+ ```
71+
72+ Query a struct type from Remote HDFS
73+ ``` shell
74+ spark.read.parquet(" hdfs://namenode:9000/user/data" ).show(false)
75+
76+ root
77+ | -- id: integer (nullable = true)
78+ | -- first_name: string (nullable = true)
79+ | -- personal_info: struct (nullable = true)
80+ | | -- firstName: string (nullable = true)
81+ | | -- lastName: string (nullable = true)
82+ | | -- ageInYears: integer (nullable = true)
83+
84+ 25/01/30 16:50:43 INFO core/src/lib.rs: Comet native library version 0.6.0 initialized
85+ == Physical Plan ==
86+ * CometColumnarToRow (2)
87+ +- CometNativeScan: (1)
88+
89+
90+ (1) CometNativeScan:
91+ Output [3]: [id#0, first_name#1, personal_info#4]
92+ Arguments: [id#0, first_name#1, personal_info#4]
93+
94+ (2) CometColumnarToRow [codegen id : 1]
95+ Input [3]: [id#0, first_name#1, personal_info#4]
96+
97+
98+ 25/01/30 16:50:44 INFO fs-hdfs-0.1.12/src/hdfs.rs: Connecting to Namenode (hdfs://namenode:9000)
99+ +---+----------+-----------------+
100+ | id | first_name| personal_info |
101+ +---+----------+-----------------+
102+ | 2 | Jane | {Jane, Smith, 34}|
103+ | 1 | John | {John, Doe, 28} |
104+ +---+----------+-----------------+
105+
106+
107+
108+ ```
109+
110+ Verify the native scan type should be ` CometNativeScan ` .
111+
112+ ## S3
113+ In progress
0 commit comments