Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 34 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,22 @@
# Restructure Kafka connector output files

<!-- TOC -->
* [Restructure Kafka connector output files](#restructure-kafka-connector-output-files)
* [Upgrade instructions](#upgrade-instructions)
* [Docker usage](#docker-usage)
* [Command line usage](#command-line-usage)
* [File Format](#file-format)
* [Compression](#compression)
* [Redis](#redis)
* [Source and target](#source-and-target)
* [Path format](#path-format)
* [Cleaner](#cleaner)
* [Service](#service)
* [Sentry monitoring](#sentry-monitoring)
* [Local build](#local-build)
* [Extending the connector](#extending-the-connector)
<!-- TOC -->

Data streamed by a Kafka Connector will be converted to a RADAR-base oriented output directory, by organizing it by project, user and collection date.
It supports data written by [RADAR S3 sink connector](https://github.com/RADAR-base/RADAR-S3-Connector) is streamed to files based on topic name only. This package transforms that output to a local directory structure as follows: `projectId/userId/topic/date_hour.csv`. The date and hour are extracted from the `time` field of each record, and is formatted in UTC time.

Expand Down Expand Up @@ -225,6 +242,23 @@ The cleaner can also be enabled with the `--cleaner` command-line flag. To run t

To run the output generator as a service that will regularly poll the source directory, add the `--service` flag and optionally the `--interval` flag to adjust the polling interval or use the corresponding configuration file parameters.

## Sentry monitoring

To enable Sentry monitoring:

1. Set a `SENTRY_DSN` environment variable that points to the desired Sentry DSN.
2. (Optional) Set the `SENTRY_LOG_LEVEL` environment variable to control the minimum log level of events sent to Sentry.
The default log level for Sentry is `ERROR`. Possible values are `TRACE`, `DEBUG`, `INFO`, `WARN`, and `ERROR`.

For further configuration of Sentry via environmental variables see [here](https://docs.sentry.io/platforms/java/configuration/#configuration-via-the-runtime-environment). For instance:

```
SENTRY_LOG_LEVEL: 'ERROR'
SENTRY_DSN: 'https://000000000000.ingest.de.sentry.io/000000000000'
SENTRY_ATTACHSTACKTRACE: true
SENTRY_STACKTRACE_APP_PACKAGES: io.confluent.connect,org.radarbase.connect.rest
```

## Local build

This package requires at least Java JDK 8. Build the distribution with
Expand Down
10 changes: 4 additions & 6 deletions build.gradle.kts
Original file line number Diff line number Diff line change
Expand Up @@ -19,21 +19,19 @@ radarRootProject {
}

radarKotlin {
kotlinVersion.set(Versions.kotlin)
javaVersion.set(Versions.java)
log4j2Version.set(Versions.log4j2)
slf4jVersion.set(Versions.slf4j)
junitVersion.set(Versions.junit)
sentryEnabled.set(true)
}

radarPublishing {
val githubRepoName = "RADAR-base/radar-output-restructure"
githubUrl.set("https://github.com/$githubRepoName.git")
developers {
developer {
id.set("bdegraaf1234")
name.set("Bastiaan de Graaf")
email.set("bastiaan@thehyve.nl")
id.set("pvannierop")
name.set("Pim Van Nierop")
email.set("pim@thehyve.nl")
organization.set("The Hyve")
}
}
Expand Down
3 changes: 1 addition & 2 deletions buildSrc/src/main/kotlin/Versions.kt
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,9 @@ object Versions {
const val project = "3.0.2"

const val java = 17
const val kotlin = "1.9.22"
const val dockerCompose = "0.17.5"

const val radarCommons = "1.1.2"
const val radarCommons = "1.1.3"
const val radarSchemas = "0.8.11"
const val jackson = "2.16.1"
const val slf4j = "2.0.9"
Expand Down
47 changes: 36 additions & 11 deletions src/main/resources/log4j2.xml
Original file line number Diff line number Diff line change
@@ -1,13 +1,38 @@
<Configuration status="WARN">
<Appenders>
<Console name="STDOUT" target="SYSTEM_OUT">
<PatternLayout pattern="[%d] %-5level - %msg (%F:%L)%n"/>
<?xml version="1.0" encoding="UTF-8" ?>
<!--
~ /*
~ * Copyright 2024 The Hyve
~ *
~ * Licensed under the Apache License, Version 2.0 (the "License");
~ * you may not use this file except in compliance with the License.
~ * You may obtain a copy of the License at
~ *
~ * http://www.apache.org/licenses/LICENSE-2.0
~ *
~ * Unless required by applicable law or agreed to in writing, software
~ * distributed under the License is distributed on an "AS IS" BASIS,
~ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
~ * See the License for the specific language governing permissions and
~ * limitations under the License.
~ */
-->
<configuration status="INFO">
<appenders>
<Console name="Console" target="SYSTEM_OUT">
<PatternLayout
pattern="%d{HH:mm:ss.SSS} [%t] %-5level %logger{36} - %msg%n"
/>
</Console>
</Appenders>
<!-- For Sentry to work the DSN must be set via SENTRY_DSN environment variable
When SENTRY_DSN is empty string, the Sentry SDK is disabled -->
<Sentry name="Sentry" debug="false"/>
</appenders>

<Loggers>
<Root level="${env:LOG4J_LOG_LEVEL:-INFO}">
<AppenderRef ref="STDOUT"/>
</Root>
</Loggers>
</Configuration>
<loggers>
<root level="${env:LOG4J_LOG_LEVEL:-INFO}">
<appender-ref ref="Console" />
<!-- Note that the Sentry logging threshold is at ERROR level by default -->
<appender-ref ref="Sentry" level="${env:SENTRY_LOG_LEVEL:-ERROR}" />
</root>
</loggers>
</configuration>
Loading