Skip to content

Conversation

@v-zhuravlev
Copy link
Contributor

@v-zhuravlev v-zhuravlev commented Oct 21, 2025

@v-zhuravlev v-zhuravlev requested a review from a team as a code owner October 21, 2025 21:59
@Dasomeone
Copy link
Member

Apologies on the delay of this review, I will try to get to it tomorrow :)

Copy link
Member

@Dasomeone Dasomeone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thank you @v-zhuravlev, apologies for the delay on this one I kept getting distracted with other things, but all good from my end here :D

Copy link

@aalhour aalhour left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for improving this, I have two notes about two alerts:

  • KafkaOfflinePartitionCount
  • KafkaUnderMinISRPartitionCount

Happy to chat about them in the comments.

expr: |||
sum by (%s) (%s) > 0
||| % [
std.join(',', this.config.groupLabels),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I am not mistaken, the offline partitions in Kafka usually stem from instances (brokers), do you also want to group by the instance here? The group labels don't include it.

I see that the alert KafkaUnderReplicatedPartitionCount nicely includes the instance labels on line 168:

std.join(',', this.config.groupLabels + this.config.instanceLabels),

std.join(',', this.config.groupLabels),
this.signals.brokerReplicaManager.underMinISRPartitions.asRuleExpression(),
],
'for': '2m',
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have data to prove this out of the box but it seems to me that 2m might short to account for fluctuations that normal operations might cause. Do you agree? I think 5m that is similar to other alerts is appropriate but still short enough for a critical alert to fire.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants