Skip to content

Commit 923c4d4

Browse files
authored
add datadog.parse_record_headers property (#49)
* add datadog.parse_record_headers property * add README.md suggestions * add README.md suggestions * revent recordToJSON
1 parent 3ece8b1 commit 923c4d4

File tree

4 files changed

+138
-47
lines changed

4 files changed

+138
-47
lines changed

README.md

Lines changed: 28 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Datadog Kafka Connect Logs
22

3-
`datadog-kafka-connect-logs` is a [Kafka Connector](http://kafka.apache.org/documentation.html#connect) for sending
3+
`datadog-kafka-connect-logs` is a [Kafka Connector](http://kafka.apache.org/documentation.html#connect) for sending
44
records from Kafka as logs to the [Datadog Logs Intake API](https://docs.datadoghq.com/api/v1/logs/).
55

66
It is a plugin meant to be installed on a [Kafka Connect Cluster](https://docs.confluent.io/current/connect/) running
@@ -12,7 +12,7 @@ besides a [Kafka Broker](https://www.confluent.io/what-is-apache-kafka/).
1212
2. Java 8 and above.
1313
3. Confluent Platform 4.0.x and above (optional).
1414

15-
To install the plugin, one must have a working instance of Kafka Connect connected to a Kafka Broker. See also
15+
To install the plugin, one must have a working instance of Kafka Connect connected to a Kafka Broker. See also
1616
[Confluent's](https://www.confluent.io/product/confluent-platform/) documentation for easily setting this up.
1717

1818
## Installation and Setup
@@ -24,25 +24,24 @@ See [Confluent's documentation](https://docs.confluent.io/current/connect/managi
2424
### Download from Github
2525

2626
Download the latest version from the GitHub [releases page](https://github.com/DataDog/datadog-kafka-connect-logs/releases).
27-
Also see [Confluent's documentation](https://docs.confluent.io/current/connect/managing/community.html) on installing
27+
Also see [Confluent's documentation](https://docs.confluent.io/current/connect/managing/community.html) on installing
2828
community connectors.
2929

3030
### Build from Source
3131

3232
1. Clone the repo from https://github.com/DataDog/datadog-kafka-connect-logs
3333
2. Verify that Java8 JRE or JDK is installed.
34-
3. Run `mvn clean compile package`. This will build the jar in the `/target` directory. The name will be
35-
`datadog-kafka-connect-logs-[VERSION].jar`.
34+
3. Run `mvn clean compile package`. This builds the jar in the `/target` directory. The file name has the format `datadog-kafka-connect-logs-[VERSION].jar`.
3635
4. The zip file for use on [Confluent Hub](https://www.confluent.io/hub/) can be found in `target/components/packages`.
3736

3837
## Quick Start
3938

4039
1. To install the plugin, place the plugin's jar file (see [previous section](#installation-and-setup) on how to download or build it)
41-
in or under the location specified in `plugin.path` . If you use Confluent Platform, simply run
42-
`confluent-hub install target/components/packages/<connector-zip-file>`.
40+
in or under the location specified in `plugin.path` . If you use Confluent Platform, run
41+
`confluent-hub install target/components/packages/<connector-zip-file>`.
4342
2. Restart your Kafka Connect instance.
44-
3. Run the following command to manually create connector tasks. Adjust `topics` to configure the Kafka topic to be
45-
ingested and set your Datadog `api_key`.
43+
3. Run the following command to manually create connector tasks. Adjust `topics` to configure the Kafka topic to be
44+
ingested and set your Datadog `api_key`.
4645

4746
```
4847
curl localhost:8083/connectors -X POST -H "Content-Type: application/json" -d '{
@@ -56,8 +55,8 @@ ingested and set your Datadog `api_key`.
5655
}'
5756
```
5857

59-
4. You can verify that data is ingested to the Datadog platform by searching for `source:kafka-connect` in the Log
60-
Explorer tab
58+
4. You can verify that data is ingested to the Datadog platform by searching for `source:kafka-connect` in the Log
59+
Explorer tab
6160
5. Use the following commands to check status, and manage connectors and tasks:
6261

6362
```
@@ -95,18 +94,19 @@ A REST call can be executed against one of the cluster instances, and the config
9594
| `topics` | Comma separated list of Kafka topics for Datadog to consume. `prod-topic1,prod-topic2,prod-topic3`||
9695
| `datadog.api_key` | The API key of your Datadog platform.||
9796
#### General Optional Parameters
98-
| Name | Description | Default Value |
99-
|-------- |------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------|
100-
| `datadog.site` | The site of the Datadog intake to send logs to (for example 'datadoghq.eu' to send data to the EU site) | `datadoghq.com` |
97+
| Name | Description | Default Value |
98+
|-------- |-------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------|
99+
| `datadog.site` | The site of the Datadog intake to send logs to (for example 'datadoghq.eu' to send data to the EU site) | `datadoghq.com` |
101100
| `datadog.url` | Custom Datadog URL endpoint where your logs will be sent. `datadog.url` takes precedence over `datadog.site`. Example: `http-intake.logs.datadoghq.com:443` ||
102-
| `datadog.tags` | Tags associated with your logs in a comma separated tag:value format. ||
103-
| `datadog.service` | The name of the application or service generating the log events. ||
104-
| `datadog.hostname` | The name of the originating host of the log. ||
105-
| `datadog.proxy.url` | Proxy endpoint when logs are not directly forwarded to Datadog. ||
106-
| `datadog.proxy.port` | Proxy port when logs are not directly forwarded to Datadog. ||
107-
| `datadog.retry.max` | The number of retries before the output plugin stops. | `5` ||
108-
| `datadog.retry.backoff_ms` | The time in milliseconds to wait following an error before a retry attempt is made. | `3000` ||
109-
| `datadog.add_published_date` | Valid settings are true or false. When set to `true`, The timestamp is retrieved from the Kafka record and passed to Datadog as `published_date` ||
101+
| `datadog.tags` | Tags associated with your logs in a comma separated tag:value format. ||
102+
| `datadog.service` | The name of the application or service generating the log events. ||
103+
| `datadog.hostname` | The name of the originating host of the log. ||
104+
| `datadog.proxy.url` | Proxy endpoint when logs are not directly forwarded to Datadog. ||
105+
| `datadog.proxy.port` | Proxy port when logs are not directly forwarded to Datadog. ||
106+
| `datadog.retry.max` | The number of retries before the output plugin stops. | `5` ||
107+
| `datadog.retry.backoff_ms` | The time in milliseconds to wait following an error before a retry attempt is made. | `3000` ||
108+
| `datadog.add_published_date` | Valid settings are true or false. When set to `true`, The timestamp is retrieved from the Kafka record and passed to Datadog as `published_date` ||
109+
| `datadog.parse_record_headers` | Valid settings are true or false. When set to `true`, Kafka Record Headers are parsed and passed to DataDog as a `kafkaheaders` object |`false`|
110110

111111
### Troubleshooting performance
112112

@@ -126,7 +126,7 @@ To improve performance of the connector, you can try the following options:
126126

127127
## Single Message Transforms
128128

129-
Kafka Connect supports Single Message Transforms that let you change the structure or content of a message. To
129+
Kafka Connect supports Single Message Transforms that let you change the structure or content of a message. To
130130
experiment with this feature, try adding these lines to your sink connector configuration:
131131

132132
```properties
@@ -135,7 +135,7 @@ transforms.addExtraField.type=org.apache.kafka.connect.transforms.InsertField$Va
135135
transforms.addExtraField.static.field=extraField
136136
transforms.addExtraField.static.value=extraValue
137137
```
138-
Now if you restart the sink connector and send some more test messages, each new record should have a `extraField` field
138+
If you restart the sink connector and send some more test messages, each new record should have a `extraField` field
139139
with value `value`. For more in-depth video, see [confluent's documentation](https://docs.confluent.io/current/connect/transforms/index.html).
140140

141141
## Testing
@@ -146,14 +146,14 @@ To run the supplied unit tests, run `mvn test` from the root of the project.
146146

147147
### System Tests
148148

149-
We use Confluent Platform for a batteries-included Kafka environment for local testing. Follow the guide
149+
Use use Confluent Platform for a batteries-included Kafka environment for local testing. Follow the guide
150150
[here](https://docs.confluent.io/current/quickstart/ce-quickstart.html) to install the Confluent Platform.
151151

152-
Then, install the [Confluent Kafka Datagen Connector](https://github.com/confluentinc/kafka-connect-datagen) to create
153-
sample data of arbitrary types. Install this Datadog Logs Connector by running
152+
Then, install the [Confluent Kafka Datagen Connector](https://github.com/confluentinc/kafka-connect-datagen) to create
153+
sample data of arbitrary types. Install this Datadog Logs Connector by running
154154
`confluent-hub install target/components/packages/<connector-zip-file>`.
155155

156-
In the `/test` directory there are some `.json` configuration files to make it easy to create Connectors. There are
156+
In the `/test` directory, there are some `.json` configuration files to make it easy to create Connectors. There are
157157
configurations for both the Datagen Connector with various datatypes, as well as the Datadog Logs Connector. To the latter,
158158
you will need to add a valid Datadog API Key for once you upload the `.json` to Confluent Platform.
159159

src/main/java/com/datadoghq/connect/logs/sink/DatadogLogsApiWriter.java

Lines changed: 44 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -6,25 +6,42 @@ This product includes software developed at Datadog (https://www.datadoghq.com/)
66
package com.datadoghq.connect.logs.sink;
77

88
import com.datadoghq.connect.logs.util.Project;
9-
import com.google.gson.*;
9+
import com.google.gson.Gson;
10+
import com.google.gson.JsonArray;
11+
import com.google.gson.JsonElement;
12+
import com.google.gson.JsonObject;
13+
import com.google.gson.JsonPrimitive;
14+
import org.apache.kafka.connect.header.Header;
1015
import org.apache.kafka.connect.json.JsonConverter;
1116
import org.apache.kafka.connect.sink.SinkRecord;
1217
import org.slf4j.Logger;
1318
import org.slf4j.LoggerFactory;
1419

15-
import java.io.*;
20+
import javax.ws.rs.core.Response;
21+
import java.io.ByteArrayOutputStream;
22+
import java.io.DataOutputStream;
23+
import java.io.IOException;
24+
import java.io.InputStream;
1625
import java.net.HttpURLConnection;
1726
import java.net.InetSocketAddress;
1827
import java.net.Proxy;
1928
import java.net.URL;
2029
import java.nio.charset.StandardCharsets;
21-
import java.util.*;
30+
import java.util.ArrayList;
31+
import java.util.Collection;
32+
import java.util.Collections;
33+
import java.util.HashMap;
34+
import java.util.List;
35+
import java.util.Map;
36+
import java.util.function.Supplier;
2237
import java.util.zip.GZIPOutputStream;
23-
import javax.ws.rs.core.Response;
38+
39+
import static java.util.stream.Collectors.toMap;
40+
import static java.util.stream.StreamSupport.stream;
2441

2542
public class DatadogLogsApiWriter {
26-
private final DatadogLogsSinkConnectorConfig config;
2743
private static final Logger log = LoggerFactory.getLogger(DatadogLogsApiWriter.class);
44+
private final DatadogLogsSinkConnectorConfig config;
2845
private final Map<String, List<SinkRecord>> batches;
2946
private final JsonConverter jsonConverter;
3047

@@ -33,7 +50,7 @@ public DatadogLogsApiWriter(DatadogLogsSinkConnectorConfig config) {
3350
this.batches = new HashMap<>();
3451
this.jsonConverter = new JsonConverter();
3552

36-
Map<String,String> jsonConverterConfig = new HashMap<String,String>();
53+
Map<String, String> jsonConverterConfig = new HashMap<>();
3754
jsonConverterConfig.put("schemas.enable", "false");
3855
jsonConverterConfig.put("decimal.format", "NUMERIC");
3956

@@ -42,13 +59,14 @@ public DatadogLogsApiWriter(DatadogLogsSinkConnectorConfig config) {
4259

4360
/**
4461
* Writes records to the Datadog Logs API.
62+
*
4563
* @param records to be written from the Source Broker to the Datadog Logs API.
4664
* @throws IOException may be thrown if the connection to the API fails.
4765
*/
4866
public void write(Collection<SinkRecord> records) throws IOException {
4967
for (SinkRecord record : records) {
5068
if (!batches.containsKey(record.topic())) {
51-
batches.put(record.topic(), new ArrayList<> (Collections.singletonList(record)));
69+
batches.put(record.topic(), new ArrayList<>(Collections.singletonList(record)));
5270
} else {
5371
batches.get(record.topic()).add(record);
5472
}
@@ -64,7 +82,7 @@ public void write(Collection<SinkRecord> records) throws IOException {
6482

6583
private void flushBatches() throws IOException {
6684
// send any outstanding batches
67-
for(Map.Entry<String,List<SinkRecord>> entry: batches.entrySet()) {
85+
for (Map.Entry<String, List<SinkRecord>> entry : batches.entrySet()) {
6886
sendBatch(entry.getKey());
6987
}
7088

@@ -97,20 +115,31 @@ private JsonArray formatBatch(String topic) {
97115
}
98116

99117
JsonElement recordJSON = recordToJSON(record);
100-
JsonObject message = populateMetadata(topic, recordJSON, record.timestamp());
118+
JsonObject message = populateMetadata(topic, recordJSON, record.timestamp(), () -> kafkaHeadersToJsonElement(record));
101119
batchRecords.add(message);
102120
}
103121

104122
return batchRecords;
105123
}
106124

125+
private JsonElement kafkaHeadersToJsonElement(SinkRecord sinkRecord) {
126+
Map<String, Object> headerMap = stream(sinkRecord.headers().spliterator(), false)
127+
.collect(toMap(Header::key, Header::value));
128+
129+
Gson gson = new Gson();
130+
131+
String jsonString = gson.toJson(headerMap);
132+
133+
return gson.fromJson(jsonString, JsonElement.class);
134+
}
135+
107136
private JsonElement recordToJSON(SinkRecord record) {
108137
byte[] rawJSONPayload = jsonConverter.fromConnectData(record.topic(), record.valueSchema(), record.value());
109138
String jsonPayload = new String(rawJSONPayload, StandardCharsets.UTF_8);
110139
return new Gson().fromJson(jsonPayload, JsonElement.class);
111140
}
112141

113-
private JsonObject populateMetadata(String topic, JsonElement message, Long timestamp) {
142+
private JsonObject populateMetadata(String topic, JsonElement message, Long timestamp, Supplier<JsonElement> kafkaHeaders) {
114143
JsonObject content = new JsonObject();
115144
String tags = "topic:" + topic;
116145
content.add("message", message);
@@ -119,6 +148,10 @@ private JsonObject populateMetadata(String topic, JsonElement message, Long time
119148
content.add("published_date", new JsonPrimitive(timestamp));
120149
}
121150

151+
if (config.parseRecordHeaders) {
152+
content.add("kafkaheaders", kafkaHeaders.get());
153+
}
154+
122155
if (config.ddTags != null) {
123156
tags += "," + config.ddTags;
124157
}
@@ -161,7 +194,7 @@ private void sendRequest(JsonArray content, URL url) throws IOException {
161194
if (Response.Status.Family.familyOf(status) != Response.Status.Family.SUCCESSFUL) {
162195
InputStream stream = con.getErrorStream();
163196
String error = "";
164-
if (stream != null ) {
197+
if (stream != null) {
165198
error = getOutput(stream);
166199
}
167200
con.disconnect();

src/main/java/com/datadoghq/connect/logs/sink/DatadogLogsSinkConnectorConfig.java

Lines changed: 11 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,7 @@ public class DatadogLogsSinkConnectorConfig extends AbstractConfig {
3434
private static final String DEFAULT_DD_SITE = "datadoghq.com";
3535
public static final String DEFAULT_DD_URL = String.format(DD_URL_FORMAT_FROM_SITE, DEFAULT_DD_SITE);
3636
public static final String ADD_PUBLISHED_DATE = "datadog.add_published_date";
37+
public static final String PARSE_RECORD_HEADERS = "datadog.parse_record_headers";
3738

3839
// Respect limit documented at https://docs.datadoghq.com/api/?lang=bash#logs
3940
public final Integer ddMaxBatchLength;
@@ -53,6 +54,7 @@ public class DatadogLogsSinkConnectorConfig extends AbstractConfig {
5354
public final Integer retryMax;
5455
public final Integer retryBackoffMs;
5556
public final boolean addPublishedDate;
57+
public final boolean parseRecordHeaders;
5658

5759
public static final ConfigDef CONFIG_DEF = baseConfigDef();
5860

@@ -75,6 +77,7 @@ public DatadogLogsSinkConnectorConfig(Boolean useSSL, Integer ddMaxBatchLength,
7577
this.ddSite = getString(DD_SITE);
7678
this.ddMaxBatchLength = ddMaxBatchLength;
7779
this.addPublishedDate = getBoolean(ADD_PUBLISHED_DATE);
80+
this.parseRecordHeaders = getBoolean(PARSE_RECORD_HEADERS);
7881
validateConfig();
7982
}
8083

@@ -175,7 +178,13 @@ private static void addMetadataConfigs(ConfigDef configDef) {
175178
false,
176179
null,
177180
Importance.MEDIUM,
178-
"Valid settings are true or false. When set to `true`, The timestamp is retrieved from the Kafka record and passed to Datadog as `published_date`");
181+
"Valid settings are true or false. When set to `true`, The timestamp is retrieved from the Kafka record and passed to Datadog as `published_date`"
182+
).define(PARSE_RECORD_HEADERS,
183+
Type.BOOLEAN,
184+
false,
185+
null,
186+
Importance.MEDIUM,
187+
"Valid settings are true or false. When set to `true`, Kafka Record Headers will be parsed and passed to DataDog as `kafkaheaders` object");
179188
}
180189

181190
private static void addProxyConfigs(ConfigDef configDef) {
@@ -250,4 +259,4 @@ private String getTags(String key) {
250259

251260
return null;
252261
}
253-
}
262+
}

0 commit comments

Comments
 (0)