-
Notifications
You must be signed in to change notification settings - Fork 234
feat: Support hdfs with OpenDAL #2244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #2244 +/- ##
============================================
+ Coverage 56.12% 57.77% +1.64%
- Complexity 976 1291 +315
============================================
Files 119 145 +26
Lines 11743 13360 +1617
Branches 2251 2378 +127
============================================
+ Hits 6591 7719 +1128
- Misses 4012 4384 +372
- Partials 1140 1257 +117 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
@comphead do you remember if we looked at OpenDAL originally for HDFS support? |
Yeah, the main concern was limited support for HDFS client https://github.com/Kimahriman/hdfs-native?tab=readme-ov-file#supported-hdfs-settings |
@wForget is there a real use case for On a separate note the crate is very actively evolving and in future might be a more successful candidate the |
OpenDAL is governed by the Apache Software Foundation, which is nice too. It also supports object-store as a backend. |
I think Iceberg-rs is relying on it, so that's a big motivation to contribute to it. |
The dependency you mentioned corresponds to the services-hdfs-native feature of OpenDAL, which is a fully native HDFS client. This PR introduces the service-hdfs feature, which uses hdrs crate, a jvm based libhfs. |
No, currently we use gluten as spark native engine, but we also use jvm-based libhdfs |
Thanks @wForget I was referring to native-hdfs crate as you correctly mentioned. Is there an object store implementation based on hdrs? from PR ny understanding |
It seems to be https://github.com/apache/opendal/tree/main/integrations/object_store |
Thanks I'm planning to run some tests this weekend using this feature and local HDFS 3 node cluster |
I'm still on it @wForget, the local hdfs cluster setup having some issue |
Thank you for your verification and feedback. |
Which issue does this PR close?
Closes #2243.
Rationale for this change
I also noticed the Apache OpenDAL project, which supports object_store and many file services. Perhaps we can integrate it to access more file services.
What changes are included in this PR?
add hdfs-opendal feature to support hdfs with opendal
How are these changes tested?
Successfully run CometReadHdfsBenchmark locally (tips: build native enable hdfs-opendal: cd native && cargo build --features hdfs-opendal)