Skip to content

Commit 84d3d2b

Browse files
authored
Merge pull request #29 from RADAR-base/release-0.4.0
Release 0.4.0
2 parents d936879 + 980ba2b commit 84d3d2b

34 files changed

+724
-307
lines changed

.dockerignore

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
.git/
2+
build/
3+
src/test/
4+
out/
5+
.gradle/
6+
.idea/

Dockerfile

Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
# Licensed under the Apache License, Version 2.0 (the "License");
2+
# you may not use this file except in compliance with the License.
3+
# You may obtain a copy of the License at
4+
#
5+
# http://www.apache.org/licenses/LICENSE-2.0
6+
#
7+
# Unless required by applicable law or agreed to in writing, software
8+
# distributed under the License is distributed on an "AS IS" BASIS,
9+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
10+
# See the License for the specific language governing permissions and
11+
# limitations under the License.
12+
13+
FROM openjdk:8-alpine AS builder
14+
15+
RUN mkdir /code
16+
WORKDIR /code
17+
18+
ENV GRADLE_OPTS -Dorg.gradle.daemon=false
19+
20+
COPY ./gradle /code/gradle
21+
COPY ./gradlew /code/
22+
RUN ./gradlew --version
23+
24+
COPY ./build.gradle ./gradle.properties ./settings.gradle /code/
25+
26+
RUN ./gradlew downloadDependencies copyDependencies startScripts
27+
28+
COPY ./src /code/src
29+
30+
RUN ./gradlew jar
31+
32+
FROM openjdk:8-jre-alpine
33+
34+
MAINTAINER Joris Borgdorff <[email protected]>
35+
36+
LABEL description="RADAR-base HDFS data restructuring"
37+
38+
COPY --from=builder /code/build/third-party/* /usr/lib/
39+
COPY --from=builder /code/build/scripts/* /usr/bin/
40+
COPY --from=builder /code/build/libs/* /usr/lib/
41+
42+
ENTRYPOINT ["radar-hdfs-restructure"]

README.md

Lines changed: 39 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -2,33 +2,61 @@
22

33
[![Build Status](https://travis-ci.org/RADAR-base/Restructure-HDFS-topic.svg?branch=master)](https://travis-ci.org/RADAR-base/Restructure-HDFS-topic)
44

5-
Data streamed to HDFS using the [RADAR HDFS sink connector](https://github.com/RADAR-CNS/RADAR-HDFS-Sink-Connector) is streamed to files based on sensor only. This package can transform that output to a local directory structure as follows: `userId/topic/date_hour.csv`. The date and hour is extracted from the `time` field of each record, and is formatted in UTC time.
5+
Data streamed to HDFS using the [RADAR HDFS sink connector](https://github.com/RADAR-CNS/RADAR-HDFS-Sink-Connector) is streamed to files based on sensor only. This package can transform that output to a local directory structure as follows: `userId/topic/date_hour.csv`. The date and hour is extracted from the `time` field of each record, and is formatted in UTC time. This package is included in the [RADAR-Docker](https://github.com/RADAR-CNS/RADAR-Docker) repository, in the `dcompose/radar-cp-hadoop-stack/hdfs_restructure.sh` script.
66

7-
## Usage
7+
## Docker usage
8+
9+
This package is available as docker image [`radarbase/radar-hdfs-restructure`](https://hub.docker.com/r/radarbase/radar-hdfs-restructure). The entrypoint of the image is the current application. So in all of the commands listed in usage, replace `radar-hdfs-restructure` with for example:
10+
```shell
11+
docker run --rm -t --network hadoop -v "$PWD/output:/output" radarbase/radar-hdfs-restructure:0.4.0 -u hdfs://hdfs -o /output /myTopic
12+
```
13+
if your docker cluster is running in the `hadoop` network and your output directory should be `./output`.
814

9-
This package is included in the [RADAR-Docker](https://github.com/RADAR-CNS/RADAR-Docker) repository, in the `dcompose/radar-cp-hadoop-stack/hdfs_restructure.sh` script.
1015

11-
## Advanced usage
16+
## Local build
1217

13-
Build jar from source with
18+
This package requires at least Java JDK 8. Build the distribution with
1419

1520
```shell
1621
./gradlew build
1722
```
18-
and find the output JAR file as `build/libs/restructurehdfs-0.3.1-all.jar`. Then run with:
1923

24+
and install the package into `/usr/local` with for example
2025
```shell
21-
java -jar restructurehdfs-0.3.1-all.jar <webhdfs_url> <hdfs_topic_path> <output_folder>
26+
sudo mkdir -p /usr/local
27+
sudo tar -xzf build/distributions/radar-hdfs-restructure-0.4.0.tar.gz -C /usr/local --strip-components=1
2228
```
2329

24-
By default, this will output the data in CSV format. If JSON format is preferred, use the following instead:
30+
Now the `radar-hdfs-restructure` command should be available.
31+
32+
## Command line usage
33+
34+
When the application is installed, it can be used as follows:
35+
36+
```shell
37+
radar-hdfs-restructure --hdfs-uri <webhdfs_url> --output-directory <output_folder> <input_path_1> [<input_path_2> ...]
2538
```
26-
java -Dorg.radarcns.format=json -jar restructurehdfs-0.3.1-all.jar <webhdfs_url> <hdfs_topic_path> <output_folder>
39+
or you can use the short form as well like -
40+
```shell
41+
radar-hdfs-restructure -u <webhdfs_url> -o <output_folder> <input_path_1> [<input_path_2> ...]
42+
```
43+
44+
To display the usage and all available options you can use the help option as follows -
45+
```shell
46+
radar-hdfs-restructure --help
47+
```
48+
Note that the options preceded by the `*` in the above output are required to run the app. Also note that there can be multiple input paths from which to read the files. Eg - `/topicAndroidNew/topic1 /topicAndroidNew/topic2 ...`. At least one input path is required.
49+
50+
By default, this will output the data in CSV format. If JSON format is preferred, use the following instead:
51+
```shell
52+
radar-hdfs-restructure --format json --hdfs-uri <webhdfs_url> --output-directory <output_folder> <input_path_1> [<input_path_2> ...]
2753
```
2854

2955
Another option is to output the data in compressed form. All files will get the `gz` suffix, and can be decompressed with a GZIP decoder. Note that for a very small number of records, this may actually increase the file size.
3056
```
31-
java -Dorg.radarcns.compress=gzip -jar restructurehdfs-0.3.1-all.jar <webhdfs_url> <hdfs_topic_path> <output_folder>
57+
radar-hdfs-restructure --compression gzip --hdfs-uri <webhdfs_url> --output-directory <output_folder> <input_path_1> [<input_path_2> ...]
3258
```
3359

34-
Finally, by default, files records are not deduplicated after writing. To enable this behaviour, specify the option `-Dorg.radarcns.deduplicate=true`. This set to false by default because of an issue with Biovotion data. Please see - [issue #16](https://github.com/RADAR-base/Restructure-HDFS-topic/issues/16) before enabling it.
60+
By default, files records are not deduplicated after writing. To enable this behaviour, specify the option `--deduplicate` or `-d`. This set to false by default because of an issue with Biovotion data. Please see - [issue #16](https://github.com/RADAR-base/Restructure-HDFS-topic/issues/16) before enabling it.
61+
62+
Finally, while processing, files are staged to a temporary directory and moved to the output directory afterwards. This has the advantage of less chance of data corruption, but it may result in slower performance. Disable staging using the `--no-stage` option.

build.gradle

Lines changed: 44 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -2,60 +2,70 @@ apply plugin: 'java'
22
apply plugin: 'application'
33

44
group 'org.radarcns.restructurehdfs'
5-
version '0.3.2'
6-
mainClassName = 'org.radarcns.RestructureAvroRecords'
7-
8-
run {
9-
args = ['webhdfs://localhost:50070', '/topicAndroidNew/android_phone_sensor_acceleration', "${projectDir}/data"]
10-
}
5+
version '0.4.0'
6+
mainClassName = 'org.radarcns.hdfs.RestructureAvroRecords'
117

128
sourceCompatibility = '1.8'
139
targetCompatibility = '1.8'
1410

15-
ext.avroVersion = '1.8.2'
16-
ext.jacksonVersion = '2.8.9'
17-
ext.hadoopVersion = '2.7.3'
11+
ext {
12+
avroVersion = '1.8.2'
13+
jacksonVersion = '2.9.6'
14+
hadoopVersion = '2.7.6'
15+
jCommanderVersion = '1.72'
16+
}
1817

1918
repositories {
2019
jcenter()
2120
}
2221

2322
dependencies {
24-
compile group: 'org.apache.hadoop', name: 'hadoop-common', version: hadoopVersion
25-
compile group: 'org.apache.avro', name: 'avro', version: avroVersion
26-
compile group: 'org.apache.avro', name: 'avro-mapred', version: avroVersion
27-
compile group: 'com.fasterxml.jackson.core' , name: 'jackson-databind', version: jacksonVersion
28-
compile group: 'com.fasterxml.jackson.dataformat' , name: 'jackson-dataformat-csv', version: jacksonVersion
23+
implementation group: 'org.apache.avro', name: 'avro', version: avroVersion
24+
implementation group: 'com.fasterxml.jackson.core' , name: 'jackson-databind', version: jacksonVersion
25+
implementation group: 'com.fasterxml.jackson.dataformat' , name: 'jackson-dataformat-csv', version: jacksonVersion
26+
implementation group: 'com.beust', name: 'jcommander', version: jCommanderVersion
2927

30-
runtime group: 'org.apache.hadoop', name: 'hadoop-hdfs', version: hadoopVersion
28+
implementation group: 'org.apache.avro', name: 'avro-mapred', version: avroVersion
29+
implementation group: 'org.apache.hadoop', name: 'hadoop-common', version: hadoopVersion
3130

32-
testCompile group: 'junit', name: 'junit', version: '4.12'
31+
runtimeOnly group: 'org.apache.hadoop', name: 'hadoop-hdfs', version: hadoopVersion
32+
33+
testImplementation group: 'junit', name: 'junit', version: '4.12'
3334
}
3435

35-
//create a single Jar with all dependencies
36-
task fatJar(type: Jar) {
37-
dependsOn configurations.runtime
36+
jar {
3837
manifest {
39-
attributes 'Implementation-Title': 'radar-restructure-hdfs',
40-
'Implementation-Version': version,
41-
'Main-Class': mainClassName
38+
attributes 'Implementation-Title': 'RADAR-base HDFS data restructuring',
39+
'Implementation-Version': version
4240
}
43-
baseName = project.name + '-all'
44-
from {
45-
configurations.runtime.collect {
46-
it.isDirectory() ? it : zipTree(it)
41+
}
42+
43+
distributions {
44+
main {
45+
contents {
46+
into ("share/${project.name}") {
47+
from 'README.md', 'LICENSE'
48+
}
4749
}
48-
} {
49-
exclude 'META-INF/*'
5050
}
51-
with jar
5251
}
5352

54-
artifacts {
55-
archives fatJar
53+
tasks.withType(Tar){
54+
compression = Compression.GZIP
55+
extension = 'tar.gz'
56+
}
57+
58+
task downloadDependencies {
59+
description "Pre-downloads dependencies"
60+
configurations.compileClasspath.files
61+
configurations.runtimeClasspath.files
62+
}
63+
64+
task copyDependencies(type: Copy) {
65+
from configurations.runtimeClasspath.files
66+
into "${buildDir}/third-party/"
5667
}
5768

58-
task wrapper(type: Wrapper) {
59-
gradleVersion = '4.1'
60-
distributionType 'all'
69+
wrapper {
70+
gradleVersion '4.8'
6171
}

gradle.properties

Whitespace-only changes.

gradle/wrapper/gradle-wrapper.jar

-295 Bytes
Binary file not shown.
Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
distributionBase=GRADLE_USER_HOME
22
distributionPath=wrapper/dists
3+
distributionUrl=https\://services.gradle.org/distributions/gradle-4.8-bin.zip
34
zipStoreBase=GRADLE_USER_HOME
45
zipStorePath=wrapper/dists
5-
distributionUrl=https\://services.gradle.org/distributions/gradle-4.1-all.zip

settings.gradle

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,2 +1 @@
1-
rootProject.name = 'restructurehdfs'
2-
1+
rootProject.name = 'radar-hdfs-restructure'

src/main/java/org/radarcns/Frequency.java renamed to src/main/java/org/radarcns/hdfs/Frequency.java

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@
1414
* limitations under the License.
1515
*/
1616

17-
package org.radarcns;
17+
package org.radarcns.hdfs;
1818

1919
import org.apache.commons.collections.MapIterator;
2020
import org.apache.commons.collections.keyvalue.MultiKey;

src/main/java/org/radarcns/OffsetRange.java renamed to src/main/java/org/radarcns/hdfs/OffsetRange.java

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@
1414
* limitations under the License.
1515
*/
1616

17-
package org.radarcns;
17+
package org.radarcns.hdfs;
1818

1919
import com.fasterxml.jackson.annotation.JsonCreator;
2020
import com.fasterxml.jackson.annotation.JsonProperty;

0 commit comments

Comments
 (0)