You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
if your docker cluster is running in the `hadoop` network and your output directory should be `./output`.
14
14
15
-
## Local build
16
-
17
-
This package requires at least Java JDK 8. Build the distribution with
18
-
19
-
```shell
20
-
./gradlew build
21
-
```
22
-
23
-
and install the package into `/usr/local` with for example
24
-
```shell
25
-
sudo mkdir -p /usr/local
26
-
sudo tar -xzf build/distributions/radar-hdfs-restructure-0.5.7.tar.gz -C /usr/local --strip-components=1
27
-
```
28
-
29
-
Now the `radar-hdfs-restructure` command should be available.
30
-
31
15
## Command line usage
32
16
33
17
When the application is installed, it can be used as follows:
@@ -46,32 +30,58 @@ radar-hdfs-restructure --help
46
30
```
47
31
Note that the options preceded by the `*` in the above output are required to run the app. Also note that there can be multiple input paths from which to read the files. Eg - `/topicAndroidNew/topic1 /topicAndroidNew/topic2 ...`. At least one input path is required.
48
32
33
+
Each argument, as well as much more, can be supplied in a config file. The default name of the config file is `restructure.yml`. Please refer to `restructure.yml` in the current directory for all available options. An alternative file can be specified with the `-F` flag.
34
+
35
+
### File Format
36
+
49
37
By default, this will output the data in CSV format. If JSON format is preferred, use the following instead:
By default, files records are not deduplicated after writing. To enable this behaviour, specify the option `--deduplicate` or `-d`. This set to false by default because of an issue with Biovotion data. Please see - [issue #16](https://github.com/RADAR-base/Restructure-HDFS-topic/issues/16) before enabling it. Deduplication can also be enabled or disabled per topic using the config file. If lines should be deduplicated using a subset of fields, e.g. only `sourceId` and `time` define a unique record and only the last record with duplicate values should be kept, then specify `topics: <topicName>: deduplicateFields: [sourceId, time]`.
43
+
44
+
### Compression
45
+
54
46
Another option is to output the data in compressed form. All files will get the `gz` suffix, and can be decompressed with a GZIP decoder. Note that for a very small number of records, this may actually increase the file size.
By default, files records are not deduplicated after writing. To enable this behaviour, specify the option `--deduplicate` or `-d`. This set to false by default because of an issue with Biovotion data. Please see - [issue #16](https://github.com/RADAR-base/Restructure-HDFS-topic/issues/16) before enabling it.
51
+
### Storage
52
+
53
+
When using local storage, to set the output user ID and group ID, specify the `-p local-uid=123` and `-p local-gid=12` properties.
54
+
55
+
### Service
56
+
57
+
To run the output generator as a service that will regularly poll the HDFS directory, add the `--service` flag and optionally the `--interval` flag to adjust the polling interval or use the corresponding configuration file parameters.
58
+
59
+
## Local build
60
+
61
+
This package requires at least Java JDK 8. Build the distribution with
62
+
63
+
```shell
64
+
./gradlew build
65
+
```
60
66
61
-
To set the output user ID and group ID, specify the `-p local-uid=123` and `-p local-gid=12` properties.
67
+
and install the package into `/usr/local` with for example
68
+
```shell
69
+
sudo mkdir -p /usr/local
70
+
sudo tar -xzf build/distributions/radar-hdfs-restructure-0.5.7.tar.gz -C /usr/local --strip-components=1
71
+
```
62
72
63
-
To run the output generator as a service that will regularly poll the HDFS directory, add the `--service` flag and optionally the `--interval` flag to adjust the polling interval.
73
+
Now the `radar-hdfs-restructure` command should be available.
64
74
65
-
## Extending the connector
75
+
###Extending the connector
66
76
67
77
To implement alternative storage paths, storage drivers or storage formats, put your custom JAR in
68
78
`$APP_DIR/lib/radar-hdfs-plugins`. To load them, use the following options:
|`storage: factory: ...`|`org.radarbase.hdfs.storage.StorageDriver`| Storage driver to use for storing data. | LocalStorageDriver |
84
+
|`format: factory: ...`|`org.radarbase.hdfs.format.FormatFactory`| Factory for output formats. | FormatFactory |
85
+
|`compression: factory: ...`|`org.radarbase.hdfs.compression.CompressionFactory`| Factory class to use for data compression. | CompressionFactory |
76
86
77
-
To pass arguments to self-assigned plugins, use `-p arg1=value1 -p arg2=value2` command-line flags and read those arguments in the `Plugin#init(Map<String, String>)` method.
87
+
The respective `<type>: properties: {}` configuration parameters can be used to provide custom configuration of the factory. This configuration will be passed to the `Plugin#init(Map<String, String>)` method.
0 commit comments