Replacing Apache JClouds with Apache OpenDAL in the tiered-storage module #25093
Replies: 6 comments 5 replies
-
|
I'll give a review if there is any pull request. cc @Xuanwo For the sake of jni and native libs, opendal-java works like rocksdbjni which is depended by Flink. And we actually rely on JNI,e.g., netty-xxx-native/Linux. |
Beta Was this translation helpful? Give feedback.
-
|
It seems that Hadoop Cloud Storage could be a Java based alternative. The abstraction is the Hadoop FileSystem abstraction: There are multiple implementations of the abstraction. It's not only S3, Azure and GCS, but many others.
The 3.5.0 release will improve how the cloud storage modules are distributed: https://issues.apache.org/jira/browse/HADOOP-19696 |
Beta Was this translation helpful? Give feedback.
-
|
good idea |
Beta Was this translation helpful? Give feedback.
-
|
Hi everyone, thank you so much for starting this thread! I’m really happy to see this happening. The OpenDAL Java Binding is now functional, though there are still many areas for improvement, such as apache/opendal#7077. Nevertheless, we can already see the great potential of using OpenDAL Java. I conducted a simple benchmark comparing OpenDAL Java Binding and Hadoop Cloud Storage. You can find the details here: https://github.com/Xuanwo/opendal-vs-hadoop. In this benchmark, I tested OpenDAL Java against Hadoop Cloud Storage using a local Minio setup. The results are as follows:
Whole results could be seen from this one page HTML: report.html |
Beta Was this translation helpful? Give feedback.
-
|
Do I need to create a PIP? |
Beta Was this translation helpful? Give feedback.
-
|
@Xuanwo @tisonkun When you have free time, could you help take a look if my summary is correct? I'm looking forward to some guidance. GoalSimilar to jclouds, validate the data object's format version during each range read (refill buffer), preferably by "simultaneously obtaining metadata in the same GET(range) to complete validation" without adding extra remote calls. Current Status 1: read/range read does not return metadataThe OpenDAL core's RpRead response currently only contains size/range without Metadata (core/core/src/raw/rps.rs:100), so range read only gets the data stream/bytes and cannot, like jclouds' getBlob(range), "get payload + metadata/userMetadata in a single GET". Current Status 2: Java binding's Metadata does not expose user metadataThe Rust core's Metadata supports user_metadata() (core/core/src/types/metadata.rs:420), but Java org.apache.opendal.Metadata has no userMetadata field (bindings/java/src/main/java/org/apache/opendal/Metadata.java:29), and JNI make_metadata also does not map (bindings/java/src/lib.rs:143). Current Status 3: Java ReadOptions are too limited, lacking conditional read/version binding capabilitiesRust ReadOptions have version/if_match/... (core/core/src/types/options.rs:60), but Java ReadOptions only have offset/length (bindings/java/src/main/java/org/apache/opendal/ReadOptions.java:25). This makes the alternative solution of "first stat once to get etag/version, then use If-Match/version to bind to the same object version in subsequent range reads" incomplete. ConclusionWith v0.55.0 + current Java binding, it is not possible to achieve jclouds' "validate (single request) conveniently during each range GET(refill)"; reverting to stat + read dual calls also fails to validate due to Java's inability to obtain user metadata. |
Beta Was this translation helpful? Give feedback.

Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Apache Pulsar's current Tiered Storage offline storage implementation uses Apache JClouds, but this library has now entered the Apache Attic (no longer maintained). As a new top-level Apache project, Apache OpenDAL is expected to become a replacement for JClouds.
Apache JClouds: https://jclouds.apache.org/
Apache OpenDAL:
https://opendal.apache.org/
https://lists.apache.org/thread/3v9g2nk734m2zplrq1fgozc7xt169bgt
@lhotari @codelipenghui @dao-jun Would this be a good direction for the pulsar tiered-storage?
@tisonkun Do you have any suggestions?
Beta Was this translation helpful? Give feedback.
All reactions