Skip to content

Conversation

@holger-waschke
Copy link
Contributor

@holger-waschke holger-waschke commented Nov 3, 2025

Problem

The hash used to identify existing Jira issues is not unique across multiple Jira projects.

Root Cause

Currently, the hash is generated using the ExtractGroupKey function, which relies on both the route matchers and group labels, see here.

func (ag *aggrGroup) GroupKey() string {
    return fmt.Sprintf("%s:%s", ag.routeKey, ag.labels)
}

Example result:

"{}/{env=\"prod\"}:{alertname=\"HighErrorRate\", cluster=\"bb\", service=\"api\"}"

This makes the hash unique only within a single Jira project.
In larger environments, where the same alert may be mirrored or transferred across multiple Jira projects you cant identitify the same issue by its hash.

Change Summary

This update modifies the hash calculation to use only the group labels via the notify.GroupLabels function.
This ensures that the resulting hash uniquely identifies an issue based solely on alert labels, independent of the Jira project.

Impact

Jira issue lookup remains scoped to a single project, as defined by the JQL query:

jql.WriteString(fmt.Sprintf(`project=%q and labels=%q order by status ASC,resolutiondate DESC`, project, alertLabel))

This change does not cause cross-project updates or searches.

It lays the groundwork for a future enhancement to allow configurable multi-project issue lookups.

Future Considerations

A potential next step could be introducing a parameter that defines which Jira projects should be included in the update/search scope.

@holger-waschke holger-waschke force-pushed the hwa/hash_on_group_labels branch from 36a4f47 to fbd25c2 Compare November 3, 2025 10:14
@grobinson-grafana
Copy link
Collaborator

This ensures that the resulting hash uniquely identifies an issue based solely on alert labels, independent of the Jira project.

I'm not sure this is true when you use continue: true to match two (or more) routes with the same group_by.

@holger-waschke
Copy link
Contributor Author

holger-waschke commented Nov 3, 2025

This ensures that the resulting hash uniquely identifies an issue based solely on alert labels, independent of the Jira project.

I'm not sure this is true when you use continue: true to match two (or more) routes with the same group_by.

Good point — but that shouldn’t be an issue, since the JQL search is always scoped to a single Jira project when looking for existing issues.
See this line

jql = "statusCategory != Done AND project=\"ASPMIG\" AND labels=\"ALERT{7690c2651d53cccc1dc8f192ad09713c12ddf536f83b121ded08cba29dbb7190}\" ORDER BY status ASC, resolutiondate DESC"

If two routes with different label matchers target the same Jira project and produce the same hash, that means they share the same group labels — so reusing the existing issue is the expected and desired behavior.
If two routes target different Jira projects but happen to generate the same hash, separate issues will still be created in each project.
This aligns with the logic of the previous Jira integration
that this implementation was migrated from, so I don’t see any downside

@grobinson-grafana
Copy link
Collaborator

grobinson-grafana commented Nov 3, 2025

Good point — but that shouldn’t be an issue, since the JQL search is always scoped to a single Jira project when looking for existing issues. See this line

The hash is derived from the GroupKey I understand? What happens to the Jira issue when there are two groups two the same GroupKey, but different alerts? Won't they overwrite each other on the same issue? That's why the RouteKey is important iirc.

@holger-waschke
Copy link
Contributor Author

Good point — but that shouldn’t be an issue, since the JQL search is always scoped to a single Jira project when looking for existing issues. See this line

The hash is derived from the GroupKey I understand? What happens to the Jira issue when there are two groups two the same GroupKey, but different alerts? Won't they overwrite each other on the same issue? That's why the RouteKey is important iirc.

Just to clarify — Alertmanager still uses the groupKey to deliver alerts to its integrations. Once the Jira integration receives the alert, it only uses the hash to search for existing issues.
If an existing issue with the same hash is found within the same Jira project (meaning the group labels match), then by definition it represents the same issue, and the update logic should apply. In that case, the Jira integration should not create a new issue.

@grobinson-grafana
Copy link
Collaborator

Just to clarify — Alertmanager still uses the groupKey to deliver alerts to its integrations. Once the Jira integration receives the alert, it only uses the hash to search for existing issues. If an existing issue with the same hash is found within the same Jira project (meaning the group labels match), then by definition it represents the same issue, and the update logic should apply. In that case, the Jira integration should not create a new issue.

Sorry I'm getting very confused. In this PR, you replaced the GroupKey with GroupLabels? But GroupLabels are not globally unique, so I believe with this change you will have hash collisions.

@holger-waschke
Copy link
Contributor Author

Sorry I'm getting very confused. In this PR, you replaced the GroupKey with GroupLabels?

there are two different layers here. one is the global alertmanager dispatch pipeline to it´s integrations. this is unchanged. this still uses the GroupKey to create the aggrGroup (route matchers + grouping labels).

second layer is the jira integration itself. here i changed the groupkey to grouplabels for calculating its hash. but it´s only being used to search for existingissues over the jira api to reuse existing issues for deduplication. (and this api search is scoped to the configured jira project in the route config anyway). this means there are no updates on existing issues other than the configured jira project in the route config itself right now.

imho the hash shouldnt differ just because they arrived using different routes. if the group labels are the same is logical the same issue.

@grobinson-grafana
Copy link
Collaborator

grobinson-grafana commented Nov 3, 2025

there are two different layers here. one is the global alertmanager dispatch pipeline to it´s integrations. this is unchanged. this still uses the GroupKey to create the aggrGroup (route matchers + grouping labels).

Yep agree.

here i changed the groupkey to grouplabels for calculating its hash. but it´s only being used to search for existingissues over the jira api to reuse existing issues for deduplication

Here I think is the potential issue.

Suppose you have the following configuration:

route:
  - matchers:
     - severity=warning
    continue: true
    receiver: jira
    group_by:
        - service
  - matchers:
     - team=ops
    continue: true
    receiver: jira
     group_by:
         - service

You can have different aggregation groups with the same group_by labels (service). For example:

Group 1

{severity="warning", service="api", alertname="foo"}

Group 2

{team="ops",service="api",alertname="bar"}

However because they have the same service, they will both map to the same Jira issue, and overwrite each other? Surely these two groups should map to different Jira issues?

@holger-waschke
Copy link
Contributor Author

holger-waschke commented Nov 3, 2025

However because they have the same service, they will both map to the same Jira issue, and overwrite each other? Surely these two groups should map to different Jira issues?

At first glance, it might seem like they should map to different issues, but this behavior is actually intentional.

Deduplication between alerts and Jira issues is a desired feature. You don’t want a new Jira issue every time the repeat_interval triggers; instead, you want related alerts to update or reopen the same Jira issue.

That’s why the choice of group_by labels is critical. In real-world configurations, you’ll almost always have multiple grouping labels to ensure alerts are grouped meaningfully.

Taking your example:

route:
  - matchers:
      - severity=warning
    continue: true
    receiver: jira
    group_by:
      - alertname
      - environment
      - hostname

  - matchers:
      - team=ops
    continue: true
    receiver: jira
    group_by:
      - alertname
      - environment
      - hostname

When an alert fires through the first route, it generates a Jira issue with a summary like:

[FIRING:1] High CPU Usage production myhost01.com

The issue hash is derived from the group_by labels.
If another alert fires through the second route with the same group values, it will produce the same hash and therefore update or reopen the existing Jira issue — not create a new one.

This is the desired and logical behavior, since both alerts refer to the same underlying problem.
If any of the group_by labels differ, a new hash (and thus a new Jira issue) will be created.

Also note: each Jira receiver is tied to one Jira project and one issue type, which helps maintain consistency across alert routes.

But maybe it´s a good idea to enhance the documentation by this points.

@grobinson-grafana
Copy link
Collaborator

grobinson-grafana commented Nov 4, 2025

Deduplication between alerts and Jira issues is a desired feature. You don’t want a new Jira issue every time the repeat_interval triggers; instead, you want related alerts to update or reopen the same Jira issue.

Agree!

This is the desired and logical behavior, since both alerts refer to the same underlying problem.

This where I am concerned.

Alertmanager doesn't work like that and doesn't force that assumption onto users. It doesn't assume that two different alert groups, from two different routes, are the same problem, which is why all Alertmanager integrations respect GroupKey as the logical separator between notifications.

I don't think this change can be the default because 1.) now Jira does something different from every other integration and 2.) it becomes inconsistent when used together with another integration in the same receiver.

I would much prefer allowing templating of the Jira issue labels so users can opt-in to this behavior if they want it?

@holger-waschke
Copy link
Contributor Author

I would much prefer allowing templating of the Jira issue labels so users can opt-in to this behavior if they want it?

got you. can you give a bit more details how your desired solution would look like?

@grobinson-grafana
Copy link
Collaborator

There is something I didn't understand about the original issue:

Example result:
"{}/{env="prod"}:{alertname="HighErrorRate", cluster="bb", service="api"}"
This makes the hash unique only within a single Jira project.

Can you help me understand what makes this true? As far as I can see there is no information in this GroupKey that identifies a Jira project, so what makes it unique to the project?

@holger-waschke
Copy link
Contributor Author

There is something I didn't understand about the original issue:

Example result:
"{}/{env="prod"}:{alertname="HighErrorRate", cluster="bb", service="api"}"
This makes the hash unique only within a single Jira project.

Can you help me understand what makes this true? As far as I can see there is no information in this GroupKey that identifies a Jira project, so what makes it unique to the project?

My point here is when using groupkey for calculate the hash you´ll end up with two different hashes for the same alert in two different jira projects.

given an example

route:
  - matchers:
      - severity=warning
      - environment=production
    continue: true
    receiver: jira_main
    group_by:
      - alertname
      - environment
      - hostname

  - matchers:
      - team=ops
    continue: true
    receiver: jira_team
    group_by:
      - alertname
      - environment
      - hostname

So in our use case: One main Jira Project with 24/7 issue handling. This is where ALL issues are being created no matter what.
In addition some teams with their own jira project like to receiver the issues directly into their dedicated Jira project.

Before this PR: in both Jira projects the issue will be created but the hash differs. You can still build our own JQL to filter for identical issues but you dont have the possibility to use the hash to search project independent for identical issues.

After this PR: Still two separately issues in both projects will be created but they share the common hash as a Jira tag.

In a future PR this could be tweaked even more for configuring which jira projects should be used for deduplication.

@grobinson-grafana
Copy link
Collaborator

grobinson-grafana commented Nov 4, 2025

Ah OK I think I am following. In this example there are two distinct routes creating distinct issues in two different jira projects, and because there are two distinct routes, any aggregation groups created from these routes have distinct GroupKeys.

My point here is when using groupkey for calculate the hash you´ll end up with two different hashes for the same alert in two different jira projects.

So from Alertmanager's PoV, this is very much intentional as these are two completely distinct routes with distinct matchers. This is how Alertmanager is supposed to work as notifications are delivered based on the route and the resulting aggregation group (i.e. matchers and group_by), not individual alerts. I think that's why you find it so confusing for the same alert matching different routes to produce different hashes.

It sounds to me like allowing an optional custom template that replaces the ALERT{%s} label would be a sensible way forward for users who want to work around this.

@ajaychandrak2002-bit

This comment was marked as spam.

@holger-waschke
Copy link
Contributor Author

Ah OK I think I am following. In this example there are two distinct routes creating distinct issues in two different jira projects, and because there are two distinct routes, any aggregation groups created from these routes have distinct GroupKeys.

My point here is when using groupkey for calculate the hash you´ll end up with two different hashes for the same alert in two different jira projects.

So from Alertmanager's PoV, this is very much intentional as these are two completely distinct routes with distinct matchers. This is how Alertmanager is supposed to work as notifications are delivered based on the route and the resulting aggregation group (i.e. matchers and group_by), not individual alerts. I think that's why you find it so confusing for the same alert matching different routes to produce different hashes.

It sounds to me like allowing an optional custom template that replaces the ALERT{%s} label would be a sensible way forward for users who want to work around this.

Yes, my point is do you really force this logic down to each individual integration? Would this make sense? Because on the dispatch logic how AM sends its individual alert to it´s integration is untouched, see this earlier comment

there are two different layers here. one is the global alertmanager dispatch pipeline to it´s integrations. this is unchanged. this still uses the GroupKey to create the aggrGroup (route matchers + grouping labels).

The hash I changed is only being used within Jira Integration for generating the JQL to identify existing issues ALERT{%s}
This matches more the jira logic and altough the old jira standalone webhook project, this integration was being migrated from.

Implementing this with a custom template will make things more complicated for end users at the end.

I´m not quite sure this is the way too go. Deduplication Jira issues on the their logical alert grouping would make most sense for most user IMHO.

@grobinson-grafana
Copy link
Collaborator

Yes, my point is do you really force this logic down to each individual integration?

Yes as a default behavior.

@holger-waschke
Copy link
Contributor Author

Yes, my point is do you really force this logic down to each individual integration?

Yes as a default behavior.

It still will be the default behavior for most integrations.

For those who have good reasons to do so may opt to choose the shared function HashGroupLabels For my POV this gives the integrations more flexibility without leaving standards behind.

If it´s not possible I can have a look at the templated solution.

@grobinson-grafana
Copy link
Collaborator

So as I said before, I don't think it's reasonable to have a default behavior where two different aggregation groups from two different matchers re-use the same Jira issue. It's both an unexpected and a breaking change and no other Jira users have demanded it. There is also no way for users to use the old behaviour in this PR either right? For all of those reasons I think it has to be an opt-in behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants