Skip to content

Commit d0416aa

Browse files
authored
docs(self-hosted): visualize kafka lags through UI (#14475)
Not all self-hosted users has experience with managing Kafka, since most of them are developers, not SRE. Through this, hopefully they can visualize (and monitor) their Kafka lags better.
1 parent 05ebaad commit d0416aa

File tree

1 file changed

+65
-9
lines changed
  • develop-docs/self-hosted/troubleshooting

1 file changed

+65
-9
lines changed

develop-docs/self-hosted/troubleshooting/kafka.mdx

Lines changed: 65 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -16,33 +16,89 @@ This happens where Kafka and the consumers get out of sync. Possible reasons are
1616
2. Having a sustained event spike that causes very long processing times, causing Kafka to drop messages as they go past the retention time
1717
3. Date/time out of sync issues due to a restart or suspend/resume cycle
1818

19+
### Visualize
20+
21+
You can visualize the Kafka consumers and their offsets by bringing an additional container, such as [Kafka UI](https://github.com/provectus/kafka-ui) or [Redpanda Console](https://github.com/redpanda-data/console) into your Docker Compose.
22+
23+
Kafka UI:
24+
```yaml
25+
kafka-ui:
26+
image: provectuslabs/kafka-ui:latest
27+
restart: on-failure
28+
environment:
29+
KAFKA_CLUSTERS_0_NAME: "local"
30+
KAFKA_CLUSTERS_0_BOOTSTRAPSERVERS: "kafka:9092"
31+
DYNAMIC_CONFIG_ENABLED: "true"
32+
ports:
33+
- "8080:8080"
34+
depends_on:
35+
- kafka
36+
```
37+
38+
Or, you can use Redpanda Console:
39+
```yaml
40+
redpanda-console:
41+
image: docker.redpanda.com/redpandadata/console:latest
42+
restart: on-failure
43+
entrypoint: /bin/sh
44+
command: -c "echo \"$$CONSOLE_CONFIG_FILE\" > /tmp/config.yml; /app/console"
45+
environment:
46+
CONFIG_FILEPATH: "/tmp/config.yml"
47+
CONSOLE_CONFIG_FILE: |
48+
kafka:
49+
brokers: ["kafka:9092"]
50+
sasl:
51+
enabled: false
52+
schemaRegistry:
53+
enabled: false
54+
kafkaConnect:
55+
enabled: false
56+
ports:
57+
- "8080:8080"
58+
depends_on:
59+
- kafka
60+
```
61+
62+
Ideally, you want to have zero lag for all consumer groups. If a consumer group has a lot of lag, you need to investigate whether it's caused by a disconnected consumer (e.g., a Sentry/Snuba container that's disconnected from Kafka) or a consumer that's stuck processing a certain message. If it's a disconnected consumer, you can either restart the container or reset the Kafka offset to 'earliest.' Otherwise, you can reset the Kafka offset to 'latest.'
63+
1964
### Recovery
2065
21-
Note: These solutions may result in data loss when resetting the offset of the snuba consumers.
66+
<Alert level="warning" title="Warning">
67+
These solutions may result in data loss for the duration of your Kafka event retention (defaults to 24 hours) when resetting the offset of the consumers.
68+
</Alert>
2269
2370
#### Proper solution
2471
25-
The _proper_ solution is as follows ([reported](https://github.com/getsentry/self-hosted/issues/478#issuecomment-666254392) by [@rmisyurev](https://github.com/rmisyurev)):
72+
The _proper_ solution is as follows ([reported](https://github.com/getsentry/self-hosted/issues/478#issuecomment-666254392) by [@rmisyurev](https://github.com/rmisyurev)). This example uses `snuba-consumers` with `events` topic. Your consumer group name and topic name may be different.
2673

27-
1. Receive consumers list:
74+
1. Shutdown the corresponding Sentry/Snuba container that's using the consumer group (You can see the corresponding containers by inspecting the `docker-compose.yml` file):
75+
```shell
76+
docker compose stop snuba-errors-consumer snuba-outcomes-consumer snuba-outcomes-billing-consumer
77+
```
78+
2. Receive consumers list:
2879
```shell
2980
docker compose run --rm kafka kafka-consumer-groups --bootstrap-server kafka:9092 --list
3081
```
31-
2. Get group info:
82+
3. Get group info:
3283
```shell
3384
docker compose run --rm kafka kafka-consumer-groups --bootstrap-server kafka:9092 --group snuba-consumers --describe
3485
```
35-
3. Watching what is going to happen with offset by using dry-run (optional):
86+
4. Watching what is going to happen with offset by using dry-run (optional):
3687
```shell
3788
docker compose run --rm kafka kafka-consumer-groups --bootstrap-server kafka:9092 --group snuba-consumers --topic events --reset-offsets --to-latest --dry-run
3889
```
39-
4. Set offset to latest and execute:
90+
5. Set offset to latest and execute:
4091
```shell
4192
docker compose run --rm kafka kafka-consumer-groups --bootstrap-server kafka:9092 --group snuba-consumers --topic events --reset-offsets --to-latest --execute
4293
```
43-
44-
<Alert title="Tip">
45-
You can replace <code>snuba-consumers</code> with other consumer groups or <code>events</code> with other topics when needed.
94+
6. Start the previously stopped Sentry/Snuba containers:
95+
```shell
96+
docker compose start snuba-errors-consumer snuba-outcomes-consumer snuba-outcomes-billing-consumer
97+
```
98+
<Alert level="info" title="Tips">
99+
* You can replace <code>snuba-consumers</code> with other consumer groups or <code>events</code> with other topics when needed.
100+
* You can reset the offset to "earliest" instead of "latest" if you want to start from the beginning.
101+
* If you have Kafka UI or Redpanda Console, you can reset the offsets through the web UI instead of the CLI.
46102
</Alert>
47103

48104
#### Another option

0 commit comments

Comments
 (0)