You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Data streamed by a Kafka Connector will be converted to a RADAR-base oriented output directory, by organizing it by project, user and collection date.
6
-
It supports data written by [RADAR HDFS sink connector](https://github.com/RADAR-base/RADAR-HDFS-Sink-Connector) is streamed to files based on topic name only. This package transforms that output to a local directory structure as follows: `projectId/userId/topic/date_hour.csv`. The date and hour are extracted from the `time` field of each record, and is formatted in UTC time. This package is included in the [RADAR-Docker](https://github.com/RADAR-base/RADAR-Docker) repository, in the `dcompose/radar-cp-hadoop-stack/bin/hdfs-restructure` script.
4
+
It supports data written by [RADAR S3 sink connector](https://github.com/RADAR-base/RADAR-S3-Connector) is streamed to files based on topic name only. This package transforms that output to a local directory structure as follows: `projectId/userId/topic/date_hour.csv`. The date and hour are extracted from the `time` field of each record, and is formatted in UTC time. This package is included in the [RADAR-Docker](https://github.com/RADAR-base/RADAR-Docker) repository, in the `dcompose/radar-cp-hadoop-stack/bin/hdfs-restructure` script.
7
5
8
6
## Upgrade instructions
9
7
8
+
Since version 2.0.0, HDFS is no longer supported, only AWS S3 or Azure Blob Storage, and local file system compatible. If HDFS is still needed, please implement a HDFS source storage factory with constructor `org.radarbase.output.source.HdfsSourceStorageFactory(resourceConfig: ResourceConfig, tempPath: Path)` with method `createSourceStorage(): SourceStorage`. This implementation may be added as a separate JAR in the `lib/radar-output-plugins/` directory of where the distribution is installed.
9
+
10
10
When upgrading to version 1.2.0, please follow the following instructions:
11
11
12
12
- When using local target storage, ensure that:
@@ -70,21 +70,11 @@ When upgrading to version 0.6.0 from version 0.5.x or earlier, please follow the
70
70
This package is available as docker image [`radarbase/radar-output-restructure`](https://hub.docker.com/r/radarbase/radar-output-restructure). The entrypoint of the image is the current application. So in all the commands listed in usage, replace `radar-output-restructure` with for example:
By default, files records are not deduplicated after writing. To enable this behaviour, specify the option `--deduplicate` or `-d`. This set to false by default because of an issue with Biovotion data. Please see - [issue #16](https://github.com/RADAR-base/Restructure-HDFS-topic/issues/16) before enabling it. Deduplication can also be enabled or disabled per topic using the config file. If lines should be deduplicated using a subset of fields, e.g. only `sourceId` and `time` define a unique record and only the last record with duplicate values should be kept, then specify `topics: <topicName>: deduplication: distinctFields: [key.sourceId, value.time]`.
@@ -106,7 +96,7 @@ By default, files records are not deduplicated after writing. To enable this beh
106
96
107
97
Another option is to output the data in compressed form. All files will get the `gz` suffix, and can be decompressed with a GZIP decoder. Note that for a very small number of records, this may actually increase the file size. Zip compression is also available.
@@ -115,26 +105,26 @@ This package assumes a Redis service running. See the example `restructure.yml`
115
105
116
106
### Source and target
117
107
118
-
The `source` and `target` properties contain resource descriptions. The source can have two types, `hdfs` and `s3`:
108
+
The `source` and `target` properties contain resource descriptions. The source can have two types, `azure` and `s3`:
119
109
120
110
```yaml
121
111
source:
122
-
type: s3 # hdfs or s3
112
+
type: s3 # azure or s3
123
113
s3:
124
114
endpoint: http://localhost:9000 # using AWS S3 endpoint is also possible.
125
115
bucket: radar
126
116
accessToken: minioadmin
127
117
secretKey: minioadmin
128
118
# only actually needed if source type is hdfs
129
-
hdfs:
130
-
nameNodes: [hdfs-namenode-1, hdfs-namenode-2]
119
+
azure:
120
+
# azure options
131
121
```
132
122
133
-
The target is similar, but it does not support HDFS, but the local file system (`local`) or `s3`.
123
+
The target is similar, and in addition supports the local file system (`local`).
134
124
135
125
```yaml
136
126
target:
137
-
type: s3# s3or local
127
+
type: s3# s3, local or azure
138
128
s3:
139
129
endpoint: http://localhost:9000
140
130
bucket: out
@@ -179,7 +169,7 @@ The cleaner can also be enabled with the `--cleaner` command-line flag. To run t
179
169
180
170
### Service
181
171
182
-
To run the output generator as a service that will regularly poll the HDFS directory, add the `--service` flag and optionally the `--interval` flag to adjust the polling interval or use the corresponding configuration file parameters.
172
+
To run the output generator as a service that will regularly poll the source directory, add the `--service` flag and optionally the `--interval` flag to adjust the polling interval or use the corresponding configuration file parameters.
183
173
184
174
## Local build
185
175
@@ -192,7 +182,7 @@ This package requires at least Java JDK 8. Build the distribution with
192
182
and install the package into `/usr/local` with for example
193
183
```shell
194
184
sudo mkdir -p /usr/local
195
-
sudo tar -xzf build/distributions/radar-output-restructure-1.2.1.tar.gz -C /usr/local --strip-components=1
185
+
sudo tar -xzf build/distributions/radar-output-restructure-2.0.0.tar.gz -C /usr/local --strip-components=1
196
186
```
197
187
198
188
Now the `radar-output-restructure` command should be available.
0 commit comments