-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Add new behavior to avoid races on config reload #4705
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -491,7 +491,7 @@ func run() int { | |
| silencer.Mutes(labels) | ||
| }) | ||
|
|
||
| disp = dispatch.NewDispatcher(alerts, routes, pipeline, marker, timeoutFunc, *dispatchMaintenanceInterval, nil, logger, dispMetrics) | ||
| newDisp := dispatch.NewDispatcher(alerts, routes, pipeline, marker, timeoutFunc, *dispatchMaintenanceInterval, nil, logger, dispMetrics) | ||
| routes.Walk(func(r *dispatch.Route) { | ||
| if r.RouteOpts.RepeatInterval > *retention { | ||
| configLogger.Warn( | ||
|
|
@@ -518,8 +518,18 @@ func run() int { | |
| } | ||
| }) | ||
|
|
||
| go disp.Run() | ||
| // first, start the inhibitor so the inhibition cache can populate | ||
| // wait for this to load alerts before starting the dispatcher so | ||
| // we don't accidentially notify for an alert that will be inhibited | ||
| go inhibitor.Run() | ||
| inhibitor.WaitForLoading() | ||
|
|
||
| // next, start the dispatcher and wait for it to load before swapping the disp pointer. | ||
| // This ensures that the API doesn't see the new dispatcher before it finishes populating | ||
| // the aggrGroups | ||
| go newDisp.Run() | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same here... |
||
| newDisp.WaitForLoading() | ||
| disp = newDisp | ||
|
|
||
| return nil | ||
| }) | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -76,6 +76,18 @@ type Alerts interface { | |
| // resolved and successfully notified about. | ||
| // They are not guaranteed to be in chronological order. | ||
| Subscribe(name string) AlertIterator | ||
|
|
||
| // SlurpAndSubcribe returns a list of all active alerts which are available | ||
| // in the provider before the call to SlurpAndSubcribe and an iterator | ||
| // of all alerts available after the call to SlurpAndSubcribe. | ||
| // SlurpAndSubcribe can be used by clients which need to build in memory state | ||
| // to know when they've processed the 'initial' batch of alerts in a provider | ||
| // after they reload their subscription. | ||
| // Implementation of SlurpAndSubcribe is optional - providers may choose to | ||
| // return an empty list for the first return value and the result of Subscribe | ||
| // for the second return value. | ||
| SlurpAndSubscribe(name string) ([]*types.Alert, AlertIterator) | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ok the name is funny, but maybe SnapshotAndSubscribe?
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Alternative names
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sure, happy to change the name. I think the origin of this name was "slurp up all the existing alerts" 😆
I wanted to avoid every possible consumer needing to make a code change after this update. However, we already made a breaking change to I know there are a few projects out there which are linking alertmanager and depending on these interfaces... |
||
|
|
||
| // GetPending returns an iterator over all alerts that have | ||
| // pending notifications. | ||
| GetPending() AlertIterator | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am wondering if this is better, or we should instead load everything not in a goroutine, then call run?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yeah, that actually would simplify things a little. The reason I did it this way was to allow this all to happen concurrently (and without waiting) if we wanted. However, in practice it's all sequential.
Maybe I could change this so that both the
InhibitorandDispatcherhaveLoad()methods? I thinkRunshould still callLoad(), but maybe we can make this work such thatLoadis a no-op after being called once.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried this, and it ended up pretty awkward. It might work, but not without a bigger refactoring. Basically, it becomes necessary to move the iterator into a struct field that's protected by the struct's global mutex. This is very awkward because we'd have to lock and unlock the mutex around reads of the iterator's channel.
The only easy fix would be splitting
SlurpAndSubscribemethod into two, but that's not desirable because the entire benefit comes from being able to atomic separate alerts into "before the call" and "after the call" groups.