Skip to content

Commit 3d79613

Browse files
authored
Merge pull request #558 from RADAR-base/release-3.0.0
Release 3.0.0
2 parents 459d7e2 + d70e608 commit 3d79613

File tree

106 files changed

+1981
-1056
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

106 files changed

+1981
-1056
lines changed

.editorconfig

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
root = true

.github/workflows/codeql.yml

Lines changed: 89 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,89 @@
1+
# For most projects, this workflow file will not need changing; you simply need
2+
# to commit it to your repository.
3+
#
4+
# You may wish to alter this file to override the set of languages analyzed,
5+
# or to provide custom queries or build logic.
6+
#
7+
# ******** NOTE ********
8+
# We have attempted to detect the languages in your repository. Please check
9+
# the `language` matrix defined below to confirm you have the correct set of
10+
# supported CodeQL languages.
11+
#
12+
name: "CodeQL"
13+
14+
on:
15+
push:
16+
branches: [ "main", "dev" ]
17+
pull_request:
18+
branches: [ "main", "dev" ]
19+
schedule:
20+
- cron: '24 21 * * 0'
21+
22+
jobs:
23+
analyze:
24+
name: Analyze
25+
# Runner size impacts CodeQL analysis time. To learn more, please see:
26+
# - https://gh.io/recommended-hardware-resources-for-running-codeql
27+
# - https://gh.io/supported-runners-and-hardware-resources
28+
# - https://gh.io/using-larger-runners
29+
# Consider using larger runners for possible analysis time improvements.
30+
runs-on: ${{ (matrix.language == 'swift' && 'macos-latest') || 'ubuntu-latest' }}
31+
timeout-minutes: ${{ (matrix.language == 'swift' && 120) || 360 }}
32+
permissions:
33+
# required for all workflows
34+
security-events: write
35+
36+
# only required for workflows in private repositories
37+
actions: read
38+
contents: read
39+
40+
strategy:
41+
fail-fast: false
42+
matrix:
43+
language: [ 'java-kotlin' ]
44+
# CodeQL supports [ 'c-cpp', 'csharp', 'go', 'java-kotlin', 'javascript-typescript', 'python', 'ruby', 'swift' ]
45+
# Use only 'java-kotlin' to analyze code written in Java, Kotlin or both
46+
# Use only 'javascript-typescript' to analyze code written in JavaScript, TypeScript or both
47+
# Learn more about CodeQL language support at https://aka.ms/codeql-docs/language-support
48+
49+
steps:
50+
- name: Checkout repository
51+
uses: actions/checkout@v4
52+
53+
- uses: actions/setup-java@v4
54+
with:
55+
distribution: 'temurin' # See 'Supported distributions' for available options
56+
java-version: '17'
57+
58+
# Initializes the CodeQL tools for scanning.
59+
- name: Initialize CodeQL
60+
uses: github/codeql-action/init@v3
61+
with:
62+
languages: ${{ matrix.language }}
63+
# If you wish to specify custom queries, you can do so here or in a config file.
64+
# By default, queries listed here will override any specified in a config file.
65+
# Prefix the list here with "+" to use these queries and those in the config file.
66+
67+
# For more details on CodeQL's query packs, refer to: https://docs.github.com/en/code-security/code-scanning/automatically-scanning-your-code-for-vulnerabilities-and-errors/configuring-code-scanning#using-queries-in-ql-packs
68+
# queries: security-extended,security-and-quality
69+
70+
71+
# Autobuild attempts to build any compiled languages (C/C++, C#, Go, Java, or Swift).
72+
# If this step fails, then you should remove it and run the build manually (see below)
73+
- name: Autobuild
74+
uses: github/codeql-action/autobuild@v3
75+
76+
# ℹ️ Command-line programs to run using the OS shell.
77+
# 📚 See https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions#jobsjob_idstepsrun
78+
79+
# If the Autobuild fails above, remove it and uncomment the following three lines.
80+
# modify them (or add more) to build your code if your project, please refer to the EXAMPLE below for guidance.
81+
82+
# - run: |
83+
# echo "Run, Build Application using script"
84+
# ./location_of_script_within_repo/buildscript.sh
85+
86+
- name: Perform CodeQL Analysis
87+
uses: github/codeql-action/analyze@v3
88+
with:
89+
category: "/language:${{matrix.language}}"

.github/workflows/main.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -95,8 +95,8 @@ jobs:
9595
# Use runtime labels from docker_meta as well as fixed labels
9696
labels: |
9797
${{ steps.docker_meta.outputs.labels }}
98-
maintainer=Joris Borgdorff <joris@thehyve.nl>
99-
org.opencontainers.image.authors=Joris Borgdorff <joris@thehyve.nl>
98+
maintainer=Bastiaan de Graaf <bastiaan@thehyve.nl>
99+
org.opencontainers.image.authors=Bastiaan de Graaf <bastiaan@thehyve.nl>
100100
org.opencontainers.image.vendor=RADAR-base
101101
org.opencontainers.image.licenses=Apache-2.0
102102

.github/workflows/publish_snapshots.yml

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -17,10 +17,6 @@ jobs:
1717
# Checks-out your repository under $GITHUB_WORKSPACE, so your job can access it
1818
- uses: actions/checkout@v3
1919

20-
- name: Has SNAPSHOT version
21-
id: is-snapshot
22-
run: grep 'version = ".*-SNAPSHOT"' build.gradle.kts
23-
2420
- uses: actions/setup-java@v3
2521
with:
2622
distribution: temurin
@@ -29,6 +25,11 @@ jobs:
2925
- name: Setup Gradle
3026
uses: gradle/gradle-build-action@v2
3127

28+
- name: Has SNAPSHOT version
29+
id: is-snapshot
30+
run: |
31+
./gradlew properties | grep 'version: .*-SNAPSHOT'
32+
3233
- name: Install gpg secret key
3334
run: |
3435
cat <(echo -e "${{ secrets.OSSRH_GPG_SECRET_KEY }}") | gpg --batch --import

.github/workflows/release.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -91,8 +91,8 @@ jobs:
9191
# Use runtime labels from docker_meta as well as fixed labels
9292
labels: |
9393
${{ steps.docker_meta.outputs.labels }}
94-
maintainer=Joris Borgdorff <joris@thehyve.nl>
95-
org.opencontainers.image.authors=Joris Borgdorff <joris@thehyve.nl>
94+
maintainer=Bastiaan de Graaf <bastiaan@thehyve.nl>
95+
org.opencontainers.image.authors=Bastiaan de Graaf <bastiaan@thehyve.nl>
9696
org.opencontainers.image.vendor=RADAR-base
9797
org.opencontainers.image.licenses=Apache-2.0
9898

.github/workflows/snyk.yaml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@ on:
33
pull_request:
44
branches:
55
- main
6+
- dev
67

78
jobs:
89
security:
@@ -29,3 +30,6 @@ jobs:
2930
--configuration-matching='^runtimeClasspath$'
3031
--org=radar-base
3132
--policy-path=$PWD/.snyk
33+
--all-projects
34+
--severity-threshold=high
35+
--fail-on=upgradable

Dockerfile

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -10,20 +10,21 @@
1010
# See the License for the specific language governing permissions and
1111
# limitations under the License.
1212

13-
FROM --platform=$BUILDPLATFORM gradle:7.5-jdk17 AS builder
13+
FROM --platform=$BUILDPLATFORM gradle:8.4-jdk17 AS builder
1414

1515
RUN mkdir /code
1616
WORKDIR /code
1717
ENV GRADLE_USER_HOME=/code/.gradlecache \
18-
GRADLE_OPTS=-Djdk.lang.Process.launchMechanism=vfork
18+
GRADLE_OPTS="-Djdk.lang.Process.launchMechanism=vfork -Dorg.gradle.vfs.watch=false"
1919

2020
COPY ./build.gradle.kts ./gradle.properties ./settings.gradle.kts /code/
21+
COPY ./buildSrc /code/buildSrc
2122

22-
RUN gradle downloadDependencies copyDependencies startScripts --no-watch-fs
23+
RUN gradle downloadDependencies copyDependencies startScripts
2324

2425
COPY ./src /code/src
2526

26-
RUN gradle jar --no-watch-fs
27+
RUN gradle jar
2728

2829
FROM eclipse-temurin:17-jre
2930

README.md

Lines changed: 10 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
# Restructure Kafka connector output files
22

33
Data streamed by a Kafka Connector will be converted to a RADAR-base oriented output directory, by organizing it by project, user and collection date.
4-
It supports data written by [RADAR S3 sink connector](https://github.com/RADAR-base/RADAR-S3-Connector) is streamed to files based on topic name only. This package transforms that output to a local directory structure as follows: `projectId/userId/topic/date_hour.csv`. The date and hour are extracted from the `time` field of each record, and is formatted in UTC time. This package is included in the [RADAR-Docker](https://github.com/RADAR-base/RADAR-Docker) repository, in the `dcompose/radar-cp-hadoop-stack/bin/hdfs-restructure` script.
4+
It supports data written by [RADAR S3 sink connector](https://github.com/RADAR-base/RADAR-S3-Connector) is streamed to files based on topic name only. This package transforms that output to a local directory structure as follows: `projectId/userId/topic/date_hour.csv`. The date and hour are extracted from the `time` field of each record, and is formatted in UTC time.
55

66
## Upgrade instructions
77

@@ -90,7 +90,7 @@ By default, this will output the data in CSV format. If JSON format is preferred
9090
radar-output-restructure --format json --output-directory <output_folder> <input_path_1> [<input_path_2> ...]
9191
```
9292

93-
By default, files records are not deduplicated after writing. To enable this behaviour, specify the option `--deduplicate` or `-d`. This set to false by default because of an issue with Biovotion data. Please see - [issue #16](https://github.com/RADAR-base/Restructure-HDFS-topic/issues/16) before enabling it. Deduplication can also be enabled or disabled per topic using the config file. If lines should be deduplicated using a subset of fields, e.g. only `sourceId` and `time` define a unique record and only the last record with duplicate values should be kept, then specify `topics: <topicName>: deduplication: distinctFields: [key.sourceId, value.time]`.
93+
By default, files records are not deduplicated after writing. To enable this behaviour, specify the option `--deduplicate` or `-d`. This set to false by default because of an issue with Biovotion data. Please see - [issue #16](https://github.com/RADAR-base/radar-output-restructure/issues/16) before enabling it. Deduplication can also be enabled or disabled per topic using the config file. If lines should be deduplicated using a subset of fields, e.g. only `sourceId` and `time` define a unique record and only the last record with duplicate values should be kept, then specify `topics: <topicName>: deduplication: distinctFields: [key.sourceId, value.time]`.
9494

9595
### Compression
9696

@@ -118,8 +118,16 @@ source:
118118
# only actually needed if source type is hdfs
119119
azure:
120120
# azure options
121+
index:
122+
# Interval to fully synchronize the index with the source storage
123+
fullSyncInterval: 3600
124+
# Interval to sync empty directories with.
125+
# They are also synced during a full sync.
126+
emptyDirectorySyncInterval: 900
121127
```
122128

129+
The index makes a scan of the source before any operations. Further list operations are done on the index only. This is especially relevant for S3 storage where list operations are priced.
130+
123131
The target is similar, and in addition supports the local file system (`local`).
124132

125133
```yaml

0 commit comments

Comments
 (0)