Skip to content

Conversation

wForget
Copy link
Member

@wForget wForget commented Aug 27, 2025

Which issue does this PR close?

Closes #2243.

Rationale for this change

I also noticed the Apache OpenDAL project, which supports object_store and many file services. Perhaps we can integrate it to access more file services.

What changes are included in this PR?

add hdfs-opendal feature to support hdfs with opendal

How are these changes tested?

Successfully run CometReadHdfsBenchmark locally (tips: build native enable hdfs-opendal: cd native && cargo build --features hdfs-opendal)

@codecov-commenter
Copy link

codecov-commenter commented Aug 27, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 57.77%. Comparing base (f09f8af) to head (488d649).
⚠️ Report is 434 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main    #2244      +/-   ##
============================================
+ Coverage     56.12%   57.77%   +1.64%     
- Complexity      976     1291     +315     
============================================
  Files           119      145      +26     
  Lines         11743    13360    +1617     
  Branches       2251     2378     +127     
============================================
+ Hits           6591     7719    +1128     
- Misses         4012     4384     +372     
- Partials       1140     1257     +117     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@wForget wForget marked this pull request as ready for review August 27, 2025 09:35
@andygrove
Copy link
Member

@comphead do you remember if we looked at OpenDAL originally for HDFS support?

@comphead
Copy link
Contributor

@comphead do you remember if we looked at OpenDAL originally for HDFS support?

Yeah, the main concern was limited support for HDFS client https://github.com/Kimahriman/hdfs-native?tab=readme-ov-file#supported-hdfs-settings
which can lead to lack of support for IO retries, authentication, etc
but this PR uses openDAL as an optional feature, which IMO should be fine

@comphead
Copy link
Contributor

@wForget is there a real use case for open_dal?

On a separate note the crate is very actively evolving and in future might be a more successful candidate the hdfs

@andygrove
Copy link
Member

@wForget is there a real use case for open_dal?

On a separate note the crate is very actively evolving and in future might be a more successful candidate the hdfs

OpenDAL is governed by the Apache Software Foundation, which is nice too. It also supports object-store as a backend.

@mbutrovich
Copy link
Contributor

@wForget is there a real use case for open_dal?

On a separate note the crate is very actively evolving and in future might be a more successful candidate the hdfs

I think Iceberg-rs is relying on it, so that's a big motivation to contribute to it.

@wForget
Copy link
Member Author

wForget commented Aug 29, 2025

@comphead do you remember if we looked at OpenDAL originally for HDFS support?

Yeah, the main concern was limited support for HDFS client https://github.com/Kimahriman/hdfs-native?tab=readme-ov-file#supported-hdfs-settings which can lead to lack of support for IO retries, authentication, etc but this PR uses openDAL as an optional feature, which IMO should be fine

The dependency you mentioned corresponds to the services-hdfs-native feature of OpenDAL, which is a fully native HDFS client. This PR introduces the service-hdfs feature, which uses hdrs crate, a jvm based libhfs.

@wForget
Copy link
Member Author

wForget commented Aug 29, 2025

@wForget is there a real use case for open_dal?

No, currently we use gluten as spark native engine, but we also use jvm-based libhdfs

@comphead
Copy link
Contributor

Thanks @wForget I was referring to native-hdfs crate as you correctly mentioned.
Reg to hdrs it looks interesting and more supported than hdfs crate with the same architecture and based on libhdfs which based on jvm libs.

Is there an object store implementation based on hdrs? from PR ny understanding object_store_opendal is the implemenation?

@wForget
Copy link
Member Author

wForget commented Aug 29, 2025

Is there an object store implementation based on hdrs? from PR ny understanding object_store_opendal is the implemenation?

It seems to be https://github.com/apache/opendal/tree/main/integrations/object_store

@comphead
Copy link
Contributor

Thanks I'm planning to run some tests this weekend using this feature and local HDFS 3 node cluster

@comphead
Copy link
Contributor

comphead commented Sep 2, 2025

I'm still on it @wForget, the local hdfs cluster setup having some issue

@wForget
Copy link
Member Author

wForget commented Sep 2, 2025

I'm still on it @wForget, the local hdfs cluster setup having some issue

Thank you for your verification and feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Integrate Apache OpenDAL to support more file serivces
5 participants