You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
# Use only 'java-kotlin' to analyze code written in Java, Kotlin or both
46
+
# Use only 'javascript-typescript' to analyze code written in JavaScript, TypeScript or both
47
+
# Learn more about CodeQL language support at https://aka.ms/codeql-docs/language-support
48
+
49
+
steps:
50
+
- name: Checkout repository
51
+
uses: actions/checkout@v4
52
+
53
+
- uses: actions/setup-java@v4
54
+
with:
55
+
distribution: 'temurin'# See 'Supported distributions' for available options
56
+
java-version: '17'
57
+
58
+
# Initializes the CodeQL tools for scanning.
59
+
- name: Initialize CodeQL
60
+
uses: github/codeql-action/init@v3
61
+
with:
62
+
languages: ${{ matrix.language }}
63
+
# If you wish to specify custom queries, you can do so here or in a config file.
64
+
# By default, queries listed here will override any specified in a config file.
65
+
# Prefix the list here with "+" to use these queries and those in the config file.
66
+
67
+
# For more details on CodeQL's query packs, refer to: https://docs.github.com/en/code-security/code-scanning/automatically-scanning-your-code-for-vulnerabilities-and-errors/configuring-code-scanning#using-queries-in-ql-packs
68
+
# queries: security-extended,security-and-quality
69
+
70
+
71
+
# Autobuild attempts to build any compiled languages (C/C++, C#, Go, Java, or Swift).
72
+
# If this step fails, then you should remove it and run the build manually (see below)
73
+
- name: Autobuild
74
+
uses: github/codeql-action/autobuild@v3
75
+
76
+
# ℹ️ Command-line programs to run using the OS shell.
77
+
# 📚 See https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions#jobsjob_idstepsrun
78
+
79
+
# If the Autobuild fails above, remove it and uncomment the following three lines.
80
+
# modify them (or add more) to build your code if your project, please refer to the EXAMPLE below for guidance.
Copy file name to clipboardExpand all lines: README.md
+10-2Lines changed: 10 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,7 +1,7 @@
1
1
# Restructure Kafka connector output files
2
2
3
3
Data streamed by a Kafka Connector will be converted to a RADAR-base oriented output directory, by organizing it by project, user and collection date.
4
-
It supports data written by [RADAR S3 sink connector](https://github.com/RADAR-base/RADAR-S3-Connector) is streamed to files based on topic name only. This package transforms that output to a local directory structure as follows: `projectId/userId/topic/date_hour.csv`. The date and hour are extracted from the `time` field of each record, and is formatted in UTC time. This package is included in the [RADAR-Docker](https://github.com/RADAR-base/RADAR-Docker) repository, in the `dcompose/radar-cp-hadoop-stack/bin/hdfs-restructure` script.
4
+
It supports data written by [RADAR S3 sink connector](https://github.com/RADAR-base/RADAR-S3-Connector) is streamed to files based on topic name only. This package transforms that output to a local directory structure as follows: `projectId/userId/topic/date_hour.csv`. The date and hour are extracted from the `time` field of each record, and is formatted in UTC time.
5
5
6
6
## Upgrade instructions
7
7
@@ -90,7 +90,7 @@ By default, this will output the data in CSV format. If JSON format is preferred
By default, files records are not deduplicated after writing. To enable this behaviour, specify the option `--deduplicate` or `-d`. This set to false by default because of an issue with Biovotion data. Please see - [issue #16](https://github.com/RADAR-base/Restructure-HDFS-topic/issues/16) before enabling it. Deduplication can also be enabled or disabled per topic using the config file. If lines should be deduplicated using a subset of fields, e.g. only `sourceId` and `time` define a unique record and only the last record with duplicate values should be kept, then specify `topics: <topicName>: deduplication: distinctFields: [key.sourceId, value.time]`.
93
+
By default, files records are not deduplicated after writing. To enable this behaviour, specify the option `--deduplicate` or `-d`. This set to false by default because of an issue with Biovotion data. Please see - [issue #16](https://github.com/RADAR-base/radar-output-restructure/issues/16) before enabling it. Deduplication can also be enabled or disabled per topic using the config file. If lines should be deduplicated using a subset of fields, e.g. only `sourceId` and `time` define a unique record and only the last record with duplicate values should be kept, then specify `topics: <topicName>: deduplication: distinctFields: [key.sourceId, value.time]`.
94
94
95
95
### Compression
96
96
@@ -118,8 +118,16 @@ source:
118
118
# only actually needed if source type is hdfs
119
119
azure:
120
120
# azure options
121
+
index:
122
+
# Interval to fully synchronize the index with the source storage
123
+
fullSyncInterval: 3600
124
+
# Interval to sync empty directories with.
125
+
# They are also synced during a full sync.
126
+
emptyDirectorySyncInterval: 900
121
127
```
122
128
129
+
The index makes a scan of the source before any operations. Further list operations are done on the index only. This is especially relevant for S3 storage where list operations are priced.
130
+
123
131
The target is similar, and in addition supports the local file system (`local`).
0 commit comments