Skip to content

Commit 54925a5

Browse files
committed
Restructure script docker run example
1 parent addb5c0 commit 54925a5

File tree

1 file changed

+27
-11
lines changed

1 file changed

+27
-11
lines changed

README.md

Lines changed: 27 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -2,43 +2,59 @@
22

33
[![Build Status](https://travis-ci.org/RADAR-base/Restructure-HDFS-topic.svg?branch=master)](https://travis-ci.org/RADAR-base/Restructure-HDFS-topic)
44

5-
Data streamed to HDFS using the [RADAR HDFS sink connector](https://github.com/RADAR-CNS/RADAR-HDFS-Sink-Connector) is streamed to files based on sensor only. This package can transform that output to a local directory structure as follows: `userId/topic/date_hour.csv`. The date and hour is extracted from the `time` field of each record, and is formatted in UTC time.
5+
Data streamed to HDFS using the [RADAR HDFS sink connector](https://github.com/RADAR-CNS/RADAR-HDFS-Sink-Connector) is streamed to files based on sensor only. This package can transform that output to a local directory structure as follows: `userId/topic/date_hour.csv`. The date and hour is extracted from the `time` field of each record, and is formatted in UTC time. This package is included in the [RADAR-Docker](https://github.com/RADAR-CNS/RADAR-Docker) repository, in the `dcompose/radar-cp-hadoop-stack/hdfs_restructure.sh` script.
66

7-
## Usage
7+
## Docker usage
8+
9+
This package is available as docker image [`radarbase/radar-hdfs-restructure`](https://hub.docker.com/r/radarbase/radar-hdfs-restructure). The entrypoint of the image is the current application. So in all of the commands listed in usage, replace `radar-hdfs-restructure` with for example:
10+
```shell
11+
docker run --rm -t --network hadoop -v "$PWD/output:/output" radarbase/radar-hdfs-restructure:0.4.0 -u hdfs://hdfs -o /output /myTopic
12+
```
13+
if your docker cluster is running in the `hadoop` network and your output directory should be `./output`.
814

9-
This package is included in the [RADAR-Docker](https://github.com/RADAR-CNS/RADAR-Docker) repository, in the `dcompose/radar-cp-hadoop-stack/hdfs_restructure.sh` script.
1015

11-
## Advanced usage
16+
## Local build
1217

13-
Build jar from source with
18+
This package requires at least Java JDK 8 . Build the distribution with
1419

1520
```shell
1621
./gradlew build
1722
```
18-
and find the output JAR file as `build/libs/restructurehdfs-0.3.3-all.jar`. Then run with:
23+
24+
and install the package into `/usr/local` with for example
25+
```shell
26+
sudo mkdir -p /usr/local
27+
sudo tar -xzf build/distributions/radar-hdfs-restructure-0.4.0.tar.gz -C /usr/local --strip-components=1
28+
```
29+
30+
Now the `radar-hdfs-restructure` command should be available.
31+
32+
## Command line usage
33+
34+
When the application is installed, it can be used as follows:
1935

2036
```shell
21-
java -jar restructurehdfs-0.3.3-all.jar --hdfs-uri <webhdfs_url> --output-directory <output_folder> <input_path_1> [<input_path_2> ...]
37+
radar-hdfs-restructure --hdfs-uri <webhdfs_url> --output-directory <output_folder> <input_path_1> [<input_path_2> ...]
2238
```
2339
or you can use the short form as well like -
2440
```shell
25-
java -jar restructurehdfs-0.3.3-all.jar -u <webhdfs_url> -o <output_folder> <input_path_1> [<input_path_2> ...]
41+
radar-hdfs-restructure -u <webhdfs_url> -o <output_folder> <input_path_1> [<input_path_2> ...]
2642
```
2743

2844
To display the usage and all available options you can use the help option as follows -
2945
```shell
30-
java -jar restructurehdfs-0.3.3-all.jar --help
46+
radar-hdfs-restructure --help
3147
```
3248
Note that the options preceded by the `*` in the above output are required to run the app. Also note that there can be multiple input paths from which to read the files. Eg - `/topicAndroidNew/topic1 /topicAndroidNew/topic2 ...`. At least one input path is required.
3349

3450
By default, this will output the data in CSV format. If JSON format is preferred, use the following instead:
3551
```shell
36-
java -jar restructurehdfs-0.3.3-all.jar --format json --hdfs-uri <webhdfs_url> --output-directory <output_folder> <input_path_1> [<input_path_2> ...]
52+
radar-hdfs-restructure --format json --hdfs-uri <webhdfs_url> --output-directory <output_folder> <input_path_1> [<input_path_2> ...]
3753
```
3854

3955
Another option is to output the data in compressed form. All files will get the `gz` suffix, and can be decompressed with a GZIP decoder. Note that for a very small number of records, this may actually increase the file size.
4056
```
41-
java -jar restructurehdfs-0.3.3-all.jar --compression gzip --hdfs-uri <webhdfs_url> --output-directory <output_folder> <input_path_1> [<input_path_2> ...]
57+
radar-hdfs-restructure --compression gzip --hdfs-uri <webhdfs_url> --output-directory <output_folder> <input_path_1> [<input_path_2> ...]
4258
```
4359

4460
By default, files records are not deduplicated after writing. To enable this behaviour, specify the option `--deduplicate` or `-d`. This set to false by default because of an issue with Biovotion data. Please see - [issue #16](https://github.com/RADAR-base/Restructure-HDFS-topic/issues/16) before enabling it.

0 commit comments

Comments
 (0)