Skip to content

Commit 0f59b35

Browse files
Add feature flag to enable logging for UTF-8 migration (#9174)
This commit moves the logging for UTF-8 migration behind a feature flag, so it can be disabled once UTF-8 strict mode is enabled. The motivation for this is that this logging produces a huge amount of log volume, and doesn't need to be enabled all the time.
1 parent 88cefdd commit 0f59b35

File tree

8 files changed

+39
-11
lines changed

8 files changed

+39
-11
lines changed

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,7 @@
3131
* [CHANGE] Ruler: Removed `-ruler.drain-notification-queue-on-shutdown` option, which is now enabled by default. #9115
3232
* [CHANGE] Querier: allow wrapping errors with context errors only when the former actually correspond to `context.Canceled` and `context.DeadlineExceeded`. #9175
3333
* [FEATURE] Alertmanager: Added `-alertmanager.log-parsing-label-matchers` to control logging when parsing label matchers. This flag is intended to be used with `-alertmanager.utf8-strict-mode-enabled` to validate UTF-8 strict mode is working as intended. The default value is `false`. #9173
34+
* [FEATURE] Alertmanager: Added `-alertmanager.utf8-migration-logging-enabled` to enable logging of tenant configurations that are incompatible with UTF-8 strict mode. The default value is `false`. #9174
3435
* [FEATURE] Querier: add experimental streaming PromQL engine, enabled with `-querier.query-engine=mimir`. #8422 #8430 #8454 #8455 #8360 #8490 #8508 #8577 #8660 #8671 #8677 #8747 #8850 #8872 #8838 #8911 #8909 #8923 #8924 #8925 #8932 #8933 #8934 #8962 #8986 #8993 #8995 #9008 #9017 #9018 #9019 #9120 #9121 #9136 #9139 #9145 #9191 #9194
3536
* [FEATURE] Experimental Kafka-based ingest storage. #6888 #6894 #6929 #6940 #6951 #6974 #6982 #7029 #7030 #7091 #7142 #7147 #7148 #7153 #7160 #7193 #7349 #7376 #7388 #7391 #7393 #7394 #7402 #7404 #7423 #7424 #7437 #7486 #7503 #7508 #7540 #7621 #7682 #7685 #7694 #7695 #7696 #7697 #7701 #7733 #7734 #7741 #7752 #7838 #7851 #7871 #7877 #7880 #7882 #7887 #7891 #7925 #7955 #7967 #8031 #8063 #8077 #8088 #8135 #8176 #8184 #8194 #8216 #8217 #8222 #8233 #8503 #8542 #8579 #8657 #8686 #8688 #8703 #8706 #8708 #8738 #8750 #8778 #8808 #8809 #8841 #8842 #8845 #8853 #8886 #8988
3637
* What it is:

cmd/mimir/config-descriptor.json

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14846,6 +14846,17 @@
1484614846
"fieldFlag": "alertmanager.log-parsing-label-matchers",
1484714847
"fieldType": "boolean",
1484814848
"fieldCategory": "experimental"
14849+
},
14850+
{
14851+
"kind": "field",
14852+
"name": "utf8_migration_logging",
14853+
"required": false,
14854+
"desc": "Enable logging of tenant configurations that are incompatible with UTF-8 strict mode.",
14855+
"fieldValue": null,
14856+
"fieldDefaultValue": false,
14857+
"fieldFlag": "alertmanager.utf8-migration-logging-enabled",
14858+
"fieldType": "boolean",
14859+
"fieldCategory": "experimental"
1484914860
}
1485014861
],
1485114862
"fieldValue": null,

cmd/mimir/help-all.txt.tmpl

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -295,6 +295,8 @@ Usage of ./cmd/mimir/mimir:
295295
Directory to store Alertmanager state and temporarily configuration files. The content of this directory is not required to be persisted between restarts unless Alertmanager replication has been disabled. (default "./data-alertmanager/")
296296
-alertmanager.storage.retention duration
297297
How long should we store stateful data (notification logs and silences). For notification log entries, refers to how long should we keep entries before they expire and are deleted. For silences, refers to how long should tenants view silences after they expire and are deleted. (default 120h0m0s)
298+
-alertmanager.utf8-migration-logging-enabled
299+
[experimental] Enable logging of tenant configurations that are incompatible with UTF-8 strict mode.
298300
-alertmanager.utf8-strict-mode-enabled
299301
[experimental] Enable UTF-8 strict mode. Allows UTF-8 characters in the matchers for routes and inhibition rules, in silences, and in the labels for alerts. It is recommended that all tenants run the `migrate-utf8` command in mimirtool before enabling this mode. Otherwise, some tenant configurations might fail to load. To identify tenants with incompatible configurations, search Mimir server logs for lines containing `Alertmanager is moving to a new parser for labels and matchers, and this input is incompatible`. To find tenant configurations that are valid but contain ambiguous matchers, search for log lines containing `Matchers input has disagreement`. Each log line includes the invalid input, a suggestion on how to fix the input (excluding ambiguous matchers, as these require manual correction), and the ID of the affected tenant. You must run Mimir with debug-level logging enabled. Otherwise, these lines aren't logged. For more information, refer to https://prometheus.io/docs/alerting/latest/configuration/#label-matchers. Enabling and then disabling UTF-8 strict mode can break existing Alertmanager configurations if tenants added UTF-8 characters to their Alertmanager configuration while it was enabled.
300302
-alertmanager.web.external-url string

docs/sources/mimir/configure/configuration-parameters/index.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2503,6 +2503,11 @@ alertmanager_client:
25032503
# UTF-8 strict mode is working as intended.
25042504
# CLI flag: -alertmanager.log-parsing-label-matchers
25052505
[log_parsing_label_matchers: <boolean> | default = false]
2506+
2507+
# (experimental) Enable logging of tenant configurations that are incompatible
2508+
# with UTF-8 strict mode.
2509+
# CLI flag: -alertmanager.utf8-migration-logging-enabled
2510+
[utf8_migration_logging: <boolean> | default = false]
25062511
```
25072512
25082513
### alertmanager_storage

docs/sources/mimir/references/architecture/components/alertmanager.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -183,7 +183,9 @@ Grafana Mimir provides a number of tools to help you identify whether any existi
183183

184184
To identify affected tenant configurations, take the following steps:
185185

186-
1. Make sure Mimir is running version 2.12 or later with debug-level logging enabled.
186+
1. Make sure Mimir is running version 2.12 or later.
187+
188+
1. Enable [`utf8-migration-logging-enabled`]({{< relref "./../../../configure/configuration-parameters#alertmanager" >}}) and set [`log_level`]({{< relref "./../../../configure/configuration-parameters#server" >}}) to `debug`. You must restart Mimir for the changes to take effect.
187189

188190
1. To identify any tenant configurations that are incompatible with UTF-8 (meaning the tenant configuration fails to load and the [fallback configuration](#fallback-configuration) is used instead), search Mimir server logs for lines containing `Alertmanager is moving to a new parser for labels and matchers, and this input is incompatible`. Each log line includes the invalid matcher from the tenant configuration and the ID of the affected tenant. For example:
189191

@@ -225,4 +227,4 @@ It's rare to find cases of disagreement in a tenant configuration, as most tenan
225227

226228
1. Any incompatible tenant configurations will fail to load. To identify if any tenant configurations are failing to load, search the Mimir server logs for lines containing `error applying config`, or query the `cortex_alertmanager_config_last_reload_successful` gauge for `0`.
227229

228-
1. You can disable debug-level logging.
230+
1. You can disable [`utf8-migration-logging-enabled`]({{< relref "./../../../configure/configuration-parameters#alertmanager" >}}) and set [`log_level`]({{< relref "./../../../configure/configuration-parameters#server" >}}) back to its previous value.

pkg/alertmanager/api.go

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -147,7 +147,7 @@ func (am *MultitenantAlertmanager) SetUserConfig(w http.ResponseWriter, r *http.
147147
}
148148

149149
cfgDesc := alertspb.ToProto(cfg.AlertmanagerConfig, cfg.TemplateFiles, userID)
150-
if err := validateUserConfig(logger, cfgDesc, am.limits, userID); err != nil {
150+
if err := validateUserConfig(logger, cfgDesc, am.limits, userID, am.cfg.UTF8MigrationLogging); err != nil {
151151
level.Warn(logger).Log("msg", errValidatingConfig, "err", err.Error())
152152
http.Error(w, fmt.Sprintf("%s: %s", errValidatingConfig, err.Error()), http.StatusBadRequest)
153153
return
@@ -185,8 +185,10 @@ func (am *MultitenantAlertmanager) DeleteUserConfig(w http.ResponseWriter, r *ht
185185
}
186186

187187
// Partially copied from: https://github.com/prometheus/alertmanager/blob/8e861c646bf67599a1704fc843c6a94d519ce312/cli/check_config.go#L65-L96
188-
func validateUserConfig(logger log.Logger, cfg alertspb.AlertConfigDesc, limits Limits, user string) error {
189-
validateMatchersInConfigDesc(logger, "api", cfg)
188+
func validateUserConfig(logger log.Logger, cfg alertspb.AlertConfigDesc, limits Limits, user string, utf8MigrationLogging bool) error {
189+
if utf8MigrationLogging {
190+
validateMatchersInConfigDesc(logger, "api", cfg)
191+
}
190192

191193
// We don't have a valid use case for empty configurations. If a tenant does not have a
192194
// configuration set and issue a request to the Alertmanager, we'll a) upload an empty

pkg/alertmanager/api_test.go

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -881,6 +881,7 @@ alertmanager_config: |
881881

882882
limits := &mockAlertManagerLimits{}
883883
am := &MultitenantAlertmanager{
884+
cfg: mockAlertmanagerConfig(t),
884885
store: prepareInMemoryAlertStore(),
885886
logger: test.NewTestingLogger(t),
886887
limits: limits,

pkg/alertmanager/multitenant.go

Lines changed: 10 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -105,6 +105,7 @@ type MultitenantAlertmanagerConfig struct {
105105
// then the UTF-8 parser will be logged. If it is disabled, then the old regular
106106
// expression parser will be logged.
107107
LogParsingLabelMatchers bool `yaml:"log_parsing_label_matchers" category:"experimental"`
108+
UTF8MigrationLogging bool `yaml:"utf8_migration_logging" category:"experimental"`
108109
}
109110

110111
const (
@@ -137,6 +138,7 @@ func (cfg *MultitenantAlertmanagerConfig) RegisterFlags(f *flag.FlagSet, logger
137138

138139
f.BoolVar(&cfg.UTF8StrictMode, "alertmanager.utf8-strict-mode-enabled", false, "Enable UTF-8 strict mode. Allows UTF-8 characters in the matchers for routes and inhibition rules, in silences, and in the labels for alerts. It is recommended that all tenants run the `migrate-utf8` command in mimirtool before enabling this mode. Otherwise, some tenant configurations might fail to load. To identify tenants with incompatible configurations, search Mimir server logs for lines containing `Alertmanager is moving to a new parser for labels and matchers, and this input is incompatible`. To find tenant configurations that are valid but contain ambiguous matchers, search for log lines containing `Matchers input has disagreement`. Each log line includes the invalid input, a suggestion on how to fix the input (excluding ambiguous matchers, as these require manual correction), and the ID of the affected tenant. You must run Mimir with debug-level logging enabled. Otherwise, these lines aren't logged. For more information, refer to https://prometheus.io/docs/alerting/latest/configuration/#label-matchers. Enabling and then disabling UTF-8 strict mode can break existing Alertmanager configurations if tenants added UTF-8 characters to their Alertmanager configuration while it was enabled.")
139140
f.BoolVar(&cfg.LogParsingLabelMatchers, "alertmanager.log-parsing-label-matchers", false, "Enable logging when parsing label matchers. This flag is intended to be used with -alertmanager.utf8-strict-mode-enabled to validate UTF-8 strict mode is working as intended.")
141+
f.BoolVar(&cfg.UTF8MigrationLogging, "alertmanager.utf8-migration-logging-enabled", false, "Enable logging of tenant configurations that are incompatible with UTF-8 strict mode.")
140142
}
141143

142144
// Validate config and returns error on failure
@@ -813,12 +815,14 @@ type amConfig struct {
813815
// setConfig applies the given configuration to the alertmanager for `userID`,
814816
// creating an alertmanager if it doesn't already exist.
815817
func (am *MultitenantAlertmanager) setConfig(cfg amConfig) error {
816-
// Instead of using "config" as the origin, as in Prometheus Alertmanager, we use "tenant".
817-
// The reason for this that the config.Load function uses the origin "config",
818-
// which is correct, but Mimir uses config.Load to validate both API requests and tenant
819-
// configurations. This means metrics from API requests are confused with metrics from
820-
// tenant configurations. To avoid this confusion, we use a different origin.
821-
validateMatchersInConfigDesc(am.logger, "tenant", cfg.AlertConfigDesc)
818+
if am.cfg.UTF8MigrationLogging {
819+
// Instead of using "config" as the origin, as in Prometheus Alertmanager, we use "tenant".
820+
// The reason for this that the config.Load function uses the origin "config",
821+
// which is correct, but Mimir uses config.Load to validate both API requests and tenant
822+
// configurations. This means metrics from API requests are confused with metrics from
823+
// tenant configurations. To avoid this confusion, we use a different origin.
824+
validateMatchersInConfigDesc(am.logger, "tenant", cfg.AlertConfigDesc)
825+
}
822826

823827
level.Debug(am.logger).Log("msg", "setting config", "user", cfg.User)
824828

0 commit comments

Comments
 (0)