Skip to content

Commit 6dc399c

Browse files
Merge pull request #235 from sysdiglabs/add-kafka-integration
Add kafka integration
2 parents 5e076a3 + 5b10690 commit 6dc399c

File tree

16 files changed

+4677
-7
lines changed

16 files changed

+4677
-7
lines changed

apps/images/kafka.png

3.16 KB
Loading

apps/kafka.yaml

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -4,12 +4,12 @@ kind: App
44
name: "kafka"
55
keywords:
66
- Message-broker
7-
- Coming soon
7+
- Available
88
availableVersions:
9-
- '2.4'
10-
shortDescription: "Apache Kafka is an open-source stream-processing software platform"
9+
- '2.7'
10+
shortDescription: "Apache Kafka is an open-source stream-processing software platform."
1111
description: |
12-
Kafka is used for building real-time data pipelines and streaming apps. It is horizontally scalable, fault-tolerant, wicked fast, and runs in production in thousands of companies.
13-
icon: https://upload.wikimedia.org/wikipedia/commons/0/05/Apache_kafka.svg
12+
Apache Kafka is an open-source distributed event streaming platform used by thousands of companies for high-performance data pipelines, streaming analytics, data integration, and mission-critical applications.
13+
icon: https://raw.githubusercontent.com/sysdiglabs/promcat-resources/master/apps/images/kafka.png
1414
website: https://kafka.apache.org/
15-
available: false
15+
available: true

resources/kafka/ALERTS.md

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
# Alerts
2+
## No Leader
3+
There is no ActiveController or 'leader' in the Kafka cluster.
4+
5+
## Too Many Leaders
6+
There is more than one ActiveController or 'leader' in the Kafka cluster.
7+
8+
## Offline Partitions
9+
There are one or more Offline Partitions. These partitions don’t have an active leader and are hence not writable or readable.
10+
11+
## Under Replicated Partitions
12+
There are one or more Under Replicated Partitions.
13+
14+
## Under In-Sync Replicated Partitions
15+
There are one or more Under In-Sync Replicated Partitions. These partitions will be unavailable to producers who use 'acks=all'.
16+
17+
## ConsumerGroup Lag Not Decreasing
18+
The ConsumerGroup lag is not decreasing. The Consumers might be down, failing to process the messages and continuously retrying, or their consumption rate is lower than the production rate of messages.
19+
20+
## ConsumerGroup Without Members
21+
The ConsumerGroup doesn't have any members.
22+
23+
## Producer High ThrottleTime By Client-Id
24+
The Producer has reached its quota and has high throttle time. Applicable when Client-Id-only quotas are being used.
25+
26+
## Producer High ThrottleTime By User
27+
The Producer has reached its quota and has high throttle time. Applicable when User-only quotas are being used.
28+
29+
## Producer High ThrottleTime By User And Client-Id
30+
The Producer has reached its quota and has high throttle time. Applicable when Client-Id + User quotas are being used.
31+
32+
## Consumer High ThrottleTime By Client-Id
33+
The Consumer has reached its quota and has high throttle time. Applicable when Client-Id-only quotas are being used.
34+
35+
## Consumer High ThrottleTime By User
36+
The Consumer has reached its quota and has high throttle time. Applicable when User-only quotas are being used.
37+
38+
## Consumer High ThrottleTime By User And Client-Id
39+
The Consumer has reached its quota and has high throttle time. Applicable when Client-Id + User quotas are being used.

resources/kafka/INSTALL.md

Lines changed: 96 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,96 @@
1+
# Prerequisites
2+
3+
# Installation of the JMX-Exporter as a sidecar
4+
The JMX-Exporter can be easily installed in two steps.
5+
6+
First deploy the ConfigMap which contains the Kafka JMX configurations. The following example is for a Kafka cluster which exposes the jmx port 9010:
7+
```
8+
helm repo add promcat-charts https://sysdiglabs.github.io/integrations-charts
9+
helm repo update
10+
helm -n kafka install kafka-jmx-exporter promcat-charts/jmx-exporter --set jmx_port=9010 --set integrationType=kafka --set onlyCreateJMXConfigMap=true
11+
```
12+
13+
Then generate a patch file and apply it to your workload (your Kafka Deployment/StatefulSet/Daemonset). The following example is for a Kafka cluster which exposes the jmx port 9010, and is deployed as a StatefulSet called 'kafka-cp-kafka':
14+
```
15+
helm template kafka-jmx-exporter promcat-charts/jmx-exporter --set jmx_port=9010 --set integrationType=kafka --set onlyCreateSidecarPatch=true > sidecar-patch.yaml
16+
kubectl -n kafka patch sts kafka-cp-kafka --patch-file sidecar-patch.yaml
17+
```
18+
19+
# Create Secrets for Authentication for the Kafka-Exporter
20+
Your Kafka cluster external endpoints might be secured by using authentication for the clients that want to connect to it (TLS, SASL+SCARM, SASL+Kerberos).
21+
If you are going to make the Kafka-Exporter (which will be deployed in the next tab) use these secured external endpoints, then you'll need to create Kubernetes Secrets in the following step.
22+
If you prefer using an internal not-secured (plaintext) endpoint for the Kafka-Exporter to connect to the Kafka cluster, then skip this step.
23+
24+
If using TLS, you'll need to create a Secret which contains the CA, the client certificate and the client key. The names of these files must be "ca.crt", "tls.crt" and "tls.key". The name of the secret can be any name that you want. Example:
25+
```
26+
kubectl create secret generic kafka-exporter-certs --from-file=./tls.key --from-file=./tls.crt --from-file=./ca.crt --dry-run=true -o yaml | kubectl apply -f -
27+
```
28+
29+
If using SASL+SCRAM, you'll need to create a Secret which contains the "username" and "password". Example:
30+
```
31+
echo -n 'admin' > username
32+
echo -n '1f2d1e2e67df' > password
33+
kubectl create secret generic kafka-exporter-sasl-scram --from-file=username --from-file=password --dry-run=true -o yaml | kubectl apply -f -
34+
```
35+
36+
If using SASL+Kerberos, you'll need to create a Secret which contains the "kerberos.conf". If the 'Kerberos Auth Type' is 'keytabAuth', it should also contain the "kerberos.keytab". Example:
37+
```
38+
kubectl create secret generic kafka-exporter-sasl-kerberos --from-file=./kerberos.conf --from-file=./kerberos.keytab --dry-run=true -o yaml | kubectl apply -f -
39+
```
40+
41+
# Installation of the Kafka-Exporter
42+
The Kafka-Exporter can be easily installed with one Helm command. The flags will change depending on the authentication used in Kafka. You can find more info about the flags in the [Kafka Exporter chart values.yaml](https://github.com/sysdiglabs/integrations-charts/blob/main/charts/kafka-exporter/values.yaml).
43+
44+
Example of Kafka-Exporter without auth:
45+
```
46+
helm -n kafka install kafka-exporter promcat-charts/kafka-exporter \
47+
--set namespaceName="kafka" \
48+
--set workloadType="statefulset" \
49+
--set workloadName="kafka" \
50+
--set kafkaServer[0]=kafka-cp-kafka:9092
51+
```
52+
53+
Example of Kafka-Exporter with TLS auth:
54+
```
55+
helm -n kafka install kafka-exporter promcat-charts/kafka-exporter \
56+
--set namespaceName="kafka" \
57+
--set workloadType="statefulset" \
58+
--set workloadName="kafka" \
59+
--set kafkaServer[0]=kafka-cp-kafka:9092 \
60+
--set tls.enabled=true \
61+
--set tls.insecureSkipVerify=false \
62+
--set tls.serverName="kafkaServerName" \
63+
--set tls.secretName="kafka-exporter-certs"
64+
```
65+
66+
Example of Kafka-Exporter with SASL+SCRAM auth:
67+
```
68+
helm -n kafka install kafka-exporter promcat-charts/kafka-exporter \
69+
--set namespaceName="kafka" \
70+
--set workloadType="statefulset" \
71+
--set workloadName="kafka" \
72+
--set kafkaServer[0]=kafka-cp-kafka:9092 \
73+
--set sasl.enabled=true \
74+
--set sasl.handshake=true \
75+
--set sasl.scram.enabled=true \
76+
--set sasl.scram.mechanism="plain" \
77+
--set sasl.scram.secretName="kafka-exporter-sasl-scram"
78+
```
79+
80+
Example of Kafka-Exporter with SASL+Kerberos auth:
81+
```
82+
helm -n kafka install kafka-exporter promcat-charts/kafka-exporter \
83+
--set namespaceName="kafka" \
84+
--set workloadType="statefulset" \
85+
--set workloadName="kafka" \
86+
--set kafkaServer[0]=kafka-cp-kafka:9092 \
87+
--set sasl.enabled=true \
88+
--set sasl.handshake=true \
89+
--set sasl.kerberos.enabled=true \
90+
--set sasl.kerberos.serviceName="kerberos-service" \
91+
--set sasl.kerberos.realm="kerberos-realm" \
92+
--set sasl.kerberos.kerberosAuthType="keytabAuth" \
93+
--set sasl.kerberos.secretName="kafka-exporter-sasl-kerberos"
94+
```
95+
96+
You can find below ConfigMap with the JMX configurations for Kafka, a patch for the JMX-exporter as a sidecar, a deployment with the Kafka-Exporter without auth, and the Sysdig Agent ConfigMap with the Prometheus job to scrape both exporters.

resources/kafka/README.md

Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,47 @@
1+
# Kafka
2+
Apache Kafka is an open-source distributed event streaming platform used by thousands of companies for high-performance data pipelines, streaming analytics, data integration, and mission-critical applications.
3+
4+
# Prometheus and exporters
5+
Since Kafka isn't instrumentalized for Prometheus, exporters are needed. Here we're using the [jmx_exporter](https://github.com/prometheus/jmx_exporter) and the [kafka_exporter](https://github.com/danielqsj/kafka_exporter).
6+
7+
# Metrics
8+
9+
- kafka_brokers
10+
- kafka_consumergroup_current_offset
11+
- kafka_consumergroup_lag
12+
- kafka_consumergroup_members
13+
- kafka_controller_active_controller
14+
- kafka_controller_offline_partitions
15+
- kafka_log_size
16+
- kafka_network_consumer_request_time_milliseconds
17+
- kafka_network_fetch_follower_time_milliseconds
18+
- kafka_network_producer_request_time_milliseconds
19+
- kafka_server_bytes_in
20+
- kafka_server_bytes_out
21+
- kafka_server_consumer_client_byterate
22+
- kafka_server_consumer_client_throttle_time
23+
- kafka_server_consumer_user_byterate
24+
- kafka_server_consumer_user_client_byterate
25+
- kafka_server_consumer_user_client_throttle_time
26+
- kafka_server_consumer_user_throttle_time
27+
- kafka_server_messages_in
28+
- kafka_server_partition_leader_count
29+
- kafka_server_producer_client_byterate
30+
- kafka_server_producer_client_throttle_time
31+
- kafka_server_producer_user_byterate
32+
- kafka_server_producer_user_client_byterate
33+
- kafka_server_producer_user_client_throttle_time
34+
- kafka_server_producer_user_throttle_time
35+
- kafka_server_under_isr_partitions
36+
- kafka_server_under_replicated_partitions
37+
- kafka_server_zookeeper_auth_failures
38+
- kafka_server_zookeeper_disconnections
39+
- kafka_server_zookeeper_expired_sessions
40+
- kafka_server_zookeeper_read_only_connections
41+
- kafka_server_zookeeper_sasl_authentications
42+
- kafka_server_zookeeper_sync_connections
43+
- kafka_topic_partition_current_offset
44+
- kafka_topic_partition_oldest_offset
45+
46+
# Attributions
47+
Configuration files, dashboards and alerts are maintained by [Sysdig team](https://sysdig.com/).

resources/kafka/alerts.yaml

Lines changed: 119 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,119 @@
1+
apiVersion: v1
2+
kind: Alert
3+
app: kafka
4+
version: 1.0.0
5+
appVersion:
6+
- '2.7'
7+
descriptionFile: ALERTS.md
8+
configurations:
9+
- kind: Prometheus
10+
data: |-
11+
groups:
12+
- name: Kafka
13+
rules:
14+
- alert: '[Kafka] No Leader'
15+
expr: |
16+
sum(kafka_controller_active_controller) < 1
17+
for: 5m
18+
labels:
19+
severity: critical
20+
annotations:
21+
description: There is no ActiveController or 'leader' in the Kafka cluster.
22+
- alert: '[Kafka] Too Many Leaders'
23+
expr: |
24+
sum(kafka_controller_active_controller) > 1
25+
for: 10m
26+
labels:
27+
severity: critical
28+
annotations:
29+
description: There is more than one ActiveController or 'leader' in the Kafka cluster.
30+
- alert: '[Kafka] Offline Partitions'
31+
expr: |
32+
sum(kafka_controller_offline_partitions) > 0
33+
for: 5m
34+
labels:
35+
severity: critical
36+
annotations:
37+
description: There are one or more Offline Partitions. These partitions don’t have an active leader and are hence not writable or readable.
38+
- alert: '[Kafka] Under Replicated Partitions'
39+
expr: |
40+
sum(kafka_server_under_replicated_partitions) > 0
41+
for: 10m
42+
labels:
43+
severity: warning
44+
annotations:
45+
description: There are one or more Under Replicated Partitions.
46+
- alert: '[Kafka] Under In-Sync Replicated Partitions'
47+
expr: |
48+
sum(kafka_server_under_isr_partitions) > 0
49+
for: 10m
50+
labels:
51+
severity: warning
52+
annotations:
53+
description: There are one or more Under In-Sync Replicated Partitions. These partitions will be unavailable to producers who use 'acks=all'.
54+
- alert: '[Kafka] ConsumerGroup Lag Not Decreasing'
55+
expr: |
56+
(sum by(kube_cluster_name, kube_namespace_name, kube_workload_name, consumergroup, topic)(kafka_consumergroup_lag) > 0)
57+
and
58+
(sum by(kube_cluster_name, kube_namespace_name, kube_workload_name, consumergroup, topic)(delta(kafka_consumergroup_lag[2m])) >= 0)
59+
for: 15m
60+
labels:
61+
severity: warning
62+
annotations:
63+
description: The ConsumerGroup lag is not decreasing. The Consumers might be down, failing to process the messages and continuously retrying, or their consumption rate is lower than the production rate of messages.
64+
- alert: '[Kafka] ConsumerGroup Without Members'
65+
expr: |
66+
sum by(kube_cluster_name, kube_namespace_name, kube_workload_name, consumergroup)(kafka_consumergroup_members) == 0
67+
for: 10m
68+
labels:
69+
severity: info
70+
annotations:
71+
description: The ConsumerGroup doesn't have any members.
72+
- alert: '[Kafka] Producer High ThrottleTime By Client-Id'
73+
expr: |
74+
max by(kube_cluster_name, kube_namespace_name, kube_workload_name, client_id)(kafka_server_producer_client_throttle_time) > 1000
75+
for: 5m
76+
labels:
77+
severity: warning
78+
annotations:
79+
description: The Producer has reached its quota and has high throttle time. Applicable when Client-Id-only quotas are being used.
80+
- alert: '[Kafka] Producer High ThrottleTime By User'
81+
expr: |
82+
max by(kube_cluster_name, kube_namespace_name, kube_workload_name, user)(kafka_server_producer_user_throttle_time) > 1000
83+
for: 5m
84+
labels:
85+
severity: warning
86+
annotations:
87+
description: The Producer has reached its quota and has high throttle time. Applicable when User-only quotas are being used.
88+
- alert: '[Kafka] Producer High ThrottleTime By User And Client-Id'
89+
expr: |
90+
max by(kube_cluster_name, kube_namespace_name, kube_workload_name, user, client_id)(kafka_server_producer_user_client_throttle_time) > 1000
91+
for: 5m
92+
labels:
93+
severity: warning
94+
annotations:
95+
description: The Producer has reached its quota and has high throttle time. Applicable when Client-Id + User quotas are being used.
96+
- alert: '[Kafka] Consumer High ThrottleTime By Client-Id'
97+
expr: |
98+
max by(kube_cluster_name, kube_namespace_name, kube_workload_name, client_id)(kafka_server_consumer_client_throttle_time) > 1000
99+
for: 5m
100+
labels:
101+
severity: warning
102+
annotations:
103+
description: The Consumer has reached its quota and has high throttle time. Applicable when Client-Id-only quotas are being used.
104+
- alert: '[Kafka] Consumer High ThrottleTime By User'
105+
expr: |
106+
max by(kube_cluster_name, kube_namespace_name, kube_workload_name, user)(kafka_server_consumer_user_throttle_time) > 1000
107+
for: 5m
108+
labels:
109+
severity: warning
110+
annotations:
111+
description: The Consumer has reached its quota and has high throttle time. Applicable when User-only quotas are being used.
112+
- alert: '[Kafka] Consumer High ThrottleTime By User And Client-Id'
113+
expr: |
114+
max by(kube_cluster_name, kube_namespace_name, kube_workload_name, user, client_id)(kafka_server_consumer_user_client_throttle_time) > 1000
115+
for: 5m
116+
labels:
117+
severity: warning
118+
annotations:
119+
description: The Consumer has reached its quota and has high throttle time. Applicable when Client-Id + User quotas are being used.

resources/kafka/dashboards.yaml

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
apiVersion: v1
2+
kind: Dashboard
3+
app: kafka
4+
version: 1.0.0
5+
appVersion:
6+
- '2.7'
7+
configurations:
8+
- name: kafka
9+
kind: Sysdig
10+
image: kafka/images/kafka.png
11+
description: |
12+
This dashboard offers information on:
13+
* Brokers
14+
* Network
15+
* Topics
16+
* ConsumerGroups
17+
* Quotas
18+
* Zookeeper
19+
file: include/Kafka.json

resources/kafka/description.yaml

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
apiVersion: v1
2+
kind: Description
3+
app: kafka
4+
version: 1.0.0
5+
appVersion:
6+
- '2.7'
7+
descriptionFile: README.md

resources/kafka/images/kafka.png

3.22 MB
Loading

0 commit comments

Comments
 (0)