Skip to content

Conversation

@kruchkov-alexandr
Copy link

Slack: Update existing messages instead of creating new ones

TL;DR

Before
image

After
image

Add update_message config option to Slack notifier. When enabled, updates existing messages in place instead of creating new ones for each alert status change.

Current behavior creates multiple messages per alert group:

#alerts channel:
[10:00] 🔥 Alert: HighCPU - FIRING
[10:05] 🔥 Alert: HighCPU - FIRING  
[10:10] ✅ Alert: HighCPU - RESOLVED

3 messages → clutters channel

With this PR:

#alerts channel:
[10:00, edited 10:10] ✅ Alert: HighCPU - RESOLVED

1 message → clean channel

How

Implementation:

  • New MetadataStore tracks message_ts and channel_id per alert group (in-memory)
  • Slack notifier checks store before sending
  • Auto-switches between chat.postMessage (new) and chat.update (existing)
  • Stores channel_id from first response (required by Slack API for updates)

Flow:

Alert → Check MetadataStore → Found? → chat.update
                           → Not Found? → chat.postMessage → Store ts & channel_id

Configuration

receivers:
  - name: 'slack-team'
    slack_configs:
      - send_resolved: true              # Required!
        update_message: true              # New option (default: false)
        api_url: 'https://slack.com/api/chat.postMessage'
        http_config:
          authorization:
            credentials_file: '/etc/alertmanager/slack-token'
            # OR
            # credentials: 'xoxb-your-bot-token'
        channel: '#alerts'
        title: '{{ .GroupLabels.alertname }} - {{ .Status | toUpper }}'
        text: |
          {{ if eq .Status "firing" }}🔥{{ else }}✅{{ end }} {{ .Alerts | len }} alert(s)
          {{ range .Alerts }}• {{ .Annotations.summary }}{{ end }}

Requirements:

  • Bot token (not webhook URL) with chat:write scope
  • send_resolved: true must be set
  • update_message: true must be set
  • Bot invited to target channel

Testing

# Build
make build

# Run
./alertmanager --log.level=debug --config.file=examples/slack-update-messages.yml

# Fire alert
curl -X POST http://localhost:9093/api/v2/alerts -H 'Content-Type: application/json' -d '[{
  "labels": {"alertname": "TestAlert", "severity": "warning"},
  "annotations": {"summary": "Test alert"}
}]'

# Wait 10-15s, check Slack → NEW message appears

# Resolve alert
curl -X POST http://localhost:9093/api/v2/alerts -H 'Content-Type: application/json' -d '[{
  "labels": {"alertname": "TestAlert", "severity": "warning"},
  "annotations": {"summary": "Test alert"},
  "endsAt": "'$(date -u +%Y-%m-%dT%H:%M:%SZ)'"
}]'

# Wait 10-15s, check Slack → SAME message updates (not new one!)

Expected logs:

# First notification:
msg="no existing message found - will create NEW"
msg="saved Slack message_ts for future updates" message_ts="..." channel_id="C01234567"

# Second notification:
msg="FOUND existing Slack message - will UPDATE" message_ts="..."
msg="using chat.update endpoint for message update"

Limitations

Current implementation:

  • Webhook doesn't work, Slack App only!
  • In-memory storage (lost on restart) - acceptable for v1, persistence can be added later
  • No HA sync yet - each instance has own cache
  • Protobuf not regenerated - using separate store instead

By design:

  • Requires bot token (webhook URLs don't support updates - Slack API limitation)
  • Channel ID required for updates (extracted from first response)

Backward Compatibility

✅ Opt-in feature, defaults to false
✅ No changes to existing configs
✅ No breaking changes

@kruchkov-alexandr kruchkov-alexandr force-pushed the feature/slack-update-existing-messages branch from 38e21dc to c4764e4 Compare November 5, 2025 12:46
@@ -0,0 +1,82 @@
// Copyright 2024 Prometheus Team
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Copyright 2024 Prometheus Team
// Copyright The Prometheus Authors

Signed-off-by: Kruchkov Alexandr <[email protected]>
@grobinson-grafana
Copy link
Collaborator

I'm not intending to block this change, but I do want to mention these limitations are the reasons @gotjosh and I didn't implement this in the past (although discussed):

In-memory storage (lost on restart) - acceptable for v1, persistence can be added later
No HA sync yet - each instance has own cache

We felt these limitations were too severe for the feature to make stable because it makes the feature behave too unpredictably.

However, it sounds like you may also have a plan about how to address those limitations going forward?

@kruchkov-alexandr
Copy link
Author

I'm not intending to block this change, but I do want to mention these limitations are the reasons @gotjosh and I didn't implement this in the past (although discussed):

In-memory storage (lost on restart) - acceptable for v1, persistence can be added later
No HA sync yet - each instance has own cache

We felt these limitations were too severe for the feature to make stable because it makes the feature behave too unpredictably.

However, it sounds like you may also have a plan about how to address those limitations going forward?

Thank you for the review! You're absolutely right about these limitations.

I actually have a solution ready for both issues. But I want to ask what you'd prefer:

Option A: Ship this as v1, iterate later
Keep the current in-memory approach. Yes, it has limitations (no persistence, no HA sync), but it works well for single-instance setups. Then I'll do a follow-up PR with the full solution.

Option B: Go all the way in this PR
I can integrate metadata directly into nflog - turns out there's already a metadata field in nflog.proto that's perfect for this! Just need to:

  • Regenerate the .pb.go files (proper serialization)
  • Wire it through DedupStage -> context -> Slack -> SetNotifiesStage -> nflog
  • Get persistence and HA sync for free via existing nflog infrastructure

I've tested Option B locally - metadata survives restarts and updates work correctly after restart. The changes are pretty clean.

What's your preference? I'm happy to go either way - just want to align with how you prefer to merge features.
Thanks.

@Spaceman1701
Copy link
Contributor

Hi! Also not looking to block this, but just a drive by comment: We have an internal patch that adds a generic key/value store to the nflog as well. We've been running our production cluster with that patch for ~2 years now. We even use it for this exact purpose!

One thing we found is that wiring it through all the notifiers is a little ugly. We ended up changing the signature of Notifier.Notify a little bit.

If there's interest, we'd be happy to upstream that. Our implementation is pretty much compatible with this PR - in the proto it's a string -> string | int64 | double.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants