Skip to content

Commit 6934854

Browse files
committed
Add test and doc
Besides, fix some small problem: 1. Typo and format cases in README.md. 2. compile error in `rust/build.rs`.
1 parent 9bd11ee commit 6934854

File tree

4 files changed

+80
-17
lines changed

4 files changed

+80
-17
lines changed

README.md

Lines changed: 24 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -9,9 +9,11 @@
99
`hdfs-native` is an HDFS client written natively in Rust. It supports nearly all major features of an HDFS client, and several key client configuration options listed below.
1010

1111
## Supported HDFS features
12+
1213
Here is a list of currently supported and unsupported but possible future features.
1314

1415
### HDFS Operations
16+
1517
- [x] Listing
1618
- [x] Reading
1719
- [x] Writing
@@ -24,14 +26,16 @@ Here is a list of currently supported and unsupported but possible future featur
2426
- [x] Set timestamps
2527

2628
### HDFS Features
29+
2730
- [x] Name Services
2831
- [x] Observer reads
2932
- [x] ViewFS
3033
- [x] Router based federation
3134
- [x] Erasure coded reads and writes
32-
- RS schema only, no support for RS-Legacy or XOR
35+
- RS schema only, no support for RS-Legacy or XOR
3336

3437
### Security Features
38+
3539
- [x] Kerberos authentication (GSSAPI SASL support) (requires libgssapi_krb5, see below)
3640
- [x] Token authentication (DIGEST-MD5 SASL support)
3741
- [x] NameNode SASL connection
@@ -40,47 +44,54 @@ Here is a list of currently supported and unsupported but possible future featur
4044
- [ ] Encryption at rest (KMS support)
4145

4246
### Kerberos Support
47+
4348
Kerberos (SASL GSSAPI) mechanism is supported through a runtime dynamic link to `libgssapi_krb5`. This must be installed separately, but is likely already installed on your system. If not you can install it by:
4449

4550
#### Debian-based systems
51+
4652
```bash
4753
apt-get install libgssapi-krb5-2
4854
```
4955

5056
#### RHEL-based systems
57+
5158
```bash
5259
yum install krb5-libs
5360
```
5461

5562
#### MacOS
63+
5664
```bash
5765
brew install krb5
5866
```
5967

6068
#### Windows
69+
6170
Download and install the Microsoft Kerberos package from https://web.mit.edu/kerberos/dist/
6271

6372
Copy the `<INSTALL FOLDER>\MIT\Kerberos\bin\gssapi64.dll` file to a folder in %PATH% and change the name to `gssapi_krb5.dll`
6473

6574
## Supported HDFS Settings
66-
The client will attempt to read Hadoop configs `core-site.xml` and `hdfs-site.xml` in the directories `$HADOOP_CONF_DIR` or if that doesn't exist, `$HADOOP_HOME/etc/hadoop`. Currently the supported configs that are used are:
75+
76+
The client will attempt to read Hadoop configs `core-site.xml` and `hdfs-site.xml` in the directories `$HADOOP_CONF_DIR` or if that doesn't exist, `$HADOOP_HOME/etc/hadoop`. Passing configs in run time is supported as well via `client::ClientBuilder`. Currently the supported configs that are used are:
77+
6778
- `fs.defaultFS` - Client::default() support
6879
- `dfs.ha.namenodes` - name service support
6980
- `dfs.namenode.rpc-address.*` - name service support
7081
- `dfs.client.failover.resolve-needed.*` - DNS based NameNode discovery
7182
- `dfs.client.failover.resolver.useFQDN.*` - DNS based NameNode discovery
7283
- `dfs.client.failover.random.order.*` - Randomize order of NameNodes to try
7384
- `dfs.client.failover.proxy.provider.*` - Supports the behavior of the following proxy providers. Any other values will default back to the `ConfiguredFailoverProxyProvider` behavior:
74-
- `org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider`
75-
- `org.apache.hadoop.hdfs.server.namenode.ha.ObserverReadProxyProvider`
76-
- `org.apache.hadoop.hdfs.server.namenode.ha.RouterObserverReadConfiguredFailoverProxyProvider`
85+
- `org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider`
86+
- `org.apache.hadoop.hdfs.server.namenode.ha.ObserverReadProxyProvider`
87+
- `org.apache.hadoop.hdfs.server.namenode.ha.RouterObserverReadConfiguredFailoverProxyProvider`
7788
- `dfs.client.block.write.replace-datanode-on-failure.enable`
7889
- `dfs.client.block.write.replace-datanode-on-failure.policy`
7990
- `dfs.client.block.write.replace-datanode-on-failure.best-effort`
8091
- `fs.viewfs.mounttable.*.link.*` - ViewFS links
8192
- `fs.viewfs.mounttable.*.linkFallback` - ViewFS link fallback
8293

83-
All other settings are generally assumed to be the defaults currently. For instance, security is assumed to be enabled and SASL negotiation is always done, but on insecure clusters this will just do SIMPLE authentication. Any setups that require other customized Hadoop client configs may not work correctly.
94+
All other settings are generally assumed to be the defaults currently. For instance, security is assumed to be enabled and SASL negotiation is always done, but on insecure clusters this will just do SIMPLE authentication. Any setups that require other customized Hadoop client configs may not work correctly.
8495

8596
## Building
8697

@@ -89,19 +100,23 @@ cargo build
89100
```
90101

91102
## Object store implementation
103+
92104
An object_store implementation for HDFS is provided in the [hdfs-native-object-store](https://github.com/datafusion-contrib/hdfs-native-object-store) crate.
93105

94106
## Running tests
107+
95108
The tests are mostly integration tests that utilize a small Java application in `rust/mindifs/` that runs a custom `MiniDFSCluster`. To run the tests, you need to have Java, Maven, Hadoop binaries, and Kerberos tools available and on your path. Any Java version between 8 and 17 should work.
96109

97110
```bash
98-
cargo test -p hdfs-native --features intergation-test
111+
cargo test -p hdfs-native --features integration-test
99112
```
100113

101114
### Python tests
115+
102116
See the [Python README](./python/README.md)
103117

104118
## Running benchmarks
119+
105120
Some of the benchmarks compare performance to the JVM based client through libhdfs via the fs-hdfs3 crate. Because of that, some extra setup is required to run the benchmarks:
106121

107122
```bash
@@ -110,13 +125,15 @@ export CLASSPATH=$(hadoop classpath)
110125
```
111126

112127
then you can run the benchmarks with
128+
113129
```bash
114130
cargo bench -p hdfs-native --features benchmark
115131
```
116132

117133
The `benchmark` feature is required to expose `minidfs` and the internal erasure coding functions to benchmark.
118134

119135
## Running examples
136+
120137
The examples make use of the `minidfs` module to create a simple HDFS cluster to run the example. This requires including the `integration-test` feature to enable the `minidfs` module. Alternatively, if you want to run the example against an existing HDFS cluster you can exclude the `integration-test` feature and make sure your `HADOOP_CONF_DIR` points to a directory with HDFS configs for talking to your cluster.
121138

122139
```bash

rust/build.rs

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,9 @@ use std::io::Result;
33
fn main() -> Result<()> {
44
#[cfg(feature = "generate-protobuf")]
55
{
6-
std::env::set_var("PROTOC", protobuf_src::protoc());
6+
unsafe {
7+
std::env::set_var("PROTOC", protobuf_src::protoc());
8+
}
79

810
prost_build::compile_protos(
911
&[

rust/src/client.rs

Lines changed: 48 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -190,16 +190,41 @@ impl IORuntime {
190190
}
191191
}
192192

193-
/// Builds a new [Client] instance. By default, configs will be loaded from the default config directories with the following precedence:
193+
/// Builds a new [Client] instance. Configs will be loaded with the following precedence:
194+
///
195+
/// - If method `ClientBuilder::with_config_dir` is invoked, configs will be loaded from `${config_dir}/{core,hdfs}-site.xml`
194196
/// - If the `HADOOP_CONF_DIR` environment variable is defined, configs will be loaded from `${HADOOP_CONF_DIR}/{core,hdfs}-site.xml`
195197
/// - If the `HADOOP_HOME` environment variable is defined, configs will be loaded from `${HADOOP_HOME}/etc/hadoop/{core,hdfs}-site.xml`
196-
/// - Otherwise no default configs are defined
198+
/// - Otherwise no configs are defined
199+
///
200+
/// Finally, configs set by `with_config` will override the configs loaded above.
197201
///
198202
/// If no URL is defined, the `fs.defaultFS` config must be defined and is used as the URL.
199203
///
200204
/// # Examples
201205
///
206+
/// Create a new client with given config directory
207+
///
208+
/// ```rust,no_run
209+
/// # use hdfs_native::ClientBuilder;
210+
/// let client = ClientBuilder::new()
211+
/// .with_config_dir("/opt/hadoop/etc/hadoop")
212+
/// .build()
213+
/// .unwrap();
214+
/// ```
215+
///
216+
/// Create a new client with the environment variable
217+
///
218+
/// ```rust,no_run
219+
/// # use hdfs_native::ClientBuilder;
220+
/// unsafe { std::env::set_var("HADOOP_CONF_DIR", "/opt/hadoop/etc/hadoop") };
221+
/// let client = ClientBuilder::new()
222+
/// .build()
223+
/// .unwrap();
224+
/// ```
225+
///
202226
/// Create a new client using the fs.defaultFS config
227+
///
203228
/// ```rust
204229
/// # use hdfs_native::ClientBuilder;
205230
/// let client = ClientBuilder::new()
@@ -209,6 +234,7 @@ impl IORuntime {
209234
/// ```
210235
///
211236
/// Create a new client connecting to a specific URL:
237+
///
212238
/// ```rust
213239
/// # use hdfs_native::ClientBuilder;
214240
/// let client = ClientBuilder::new()
@@ -218,6 +244,7 @@ impl IORuntime {
218244
/// ```
219245
///
220246
/// Create a new client using a dedicated tokio runtime for spawned tasks and IO operations
247+
///
221248
/// ```rust
222249
/// # use hdfs_native::ClientBuilder;
223250
/// let client = ClientBuilder::new()
@@ -1319,4 +1346,23 @@ mod test {
13191346
.is_ok()
13201347
);
13211348
}
1349+
1350+
#[test]
1351+
fn test_set_conf_dir() {
1352+
// Configuration directory set in run time overriding env one
1353+
// Case 1: conf dir not exists, build fail
1354+
assert!(
1355+
ClientBuilder::new()
1356+
.with_config_dir("target/test/non-exist-dir")
1357+
.build()
1358+
.is_err()
1359+
);
1360+
// Case 2: conf dir exists, build success
1361+
assert!(
1362+
ClientBuilder::new()
1363+
.with_config_dir("target/test")
1364+
.build()
1365+
.is_ok()
1366+
)
1367+
}
13221368
}

rust/src/common/config.rs

Lines changed: 5 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -43,23 +43,21 @@ pub struct Configuration {
4343
}
4444

4545
impl Configuration {
46-
pub fn new(
46+
pub(crate) fn new(
4747
conf_dir: Option<String>,
4848
conf_map: Option<HashMap<String, String>>,
4949
) -> Result<Self> {
50-
let mut configration = Configuration {
51-
map: HashMap::new(),
52-
};
50+
let mut map = HashMap::new();
5351

5452
if let Some(conf_dir) = Self::parse_conf_dir(conf_dir) {
55-
configration.map = Self::parse_conf(conf_dir)?;
53+
map = Self::parse_conf(conf_dir)?;
5654
}
5755

5856
if let Some(conf_map) = conf_map {
59-
configration.map.extend(conf_map);
57+
map.extend(conf_map);
6058
}
6159

62-
Ok(configration)
60+
Ok(Configuration { map })
6361
}
6462

6563
/// Get a value from the config, returning None if the key wasn't defined.

0 commit comments

Comments
 (0)