feat: add Prometheus Pushgateway support for CLI apps#3176
feat: add Prometheus Pushgateway support for CLI apps#3176coolwednesday wants to merge 1 commit intogofr-dev:developmentfrom
Conversation
CLI apps are short-lived and exit before Prometheus can scrape /metrics. This adds push-based metrics export via Pushgateway, configured through METRICS_PUSH_GATEWAY_URL env var, along with auto CLI metrics tracking (duration, success/error counters) and observability infrastructure. Closes gofr-dev#2232
Umang01-hash
left a comment
There was a problem hiding this comment.
-
Issue #2232 explicitly listed "Support cleanup (optional) so old metrics don't pile up" as a requirement. Every CronJob run permanently adds a job group to the Pushgateway. Please add A
Delete(ctx context.Context)error method on PushGateway using pusher.DeleteContext(ctx) andMETRICS_PUSH_GATEWAY_DELETE_ON_FINISH=trueenv var to opt in . -
All apps without
APP_NAMEset push under the same job group and silently overwrite each other. Change the fallback to filepath.Base(os.Args[0]) or add a dedicatedMETRICS_PUSH_GATEWAY_JOBenv var override. -
Current max bucket is 60s. Cron buckets extend to 3600s. A 5-minute batch job falls into +Inf only. Align upper boundary with app_cron_duration_seconds.
-
Metric naming inconsistency with cron :
app_cmd_errors_total → app_cmd_failures (match cron's _failures)
app_cmd_success_total → app_cmd_success (match cron's no-_total)
Add app_cmd_total (match cron's app_cron_job_total) -
Move
metricServer.Shutdown(ctx)beforecontainer.Close()in Shutdown() so the Prometheus scrape endpoint stops accepting requests before the OTel meter provider is shut down.
| } | ||
|
|
||
| if c.pushGateway != nil { | ||
| err = errors.Join(err, c.pushGateway.Push(context.Background())) |
There was a problem hiding this comment.
Push call has no timeout; context.Background() used.
If the Pushgateway is unreachable via a firewall black-hole, the CLI hangs indefinitely at exit.
| File file.FileSystem | ||
|
|
||
| meterProvider meterProviderShutdowner | ||
| pushGateway *exporters.PushGateway |
There was a problem hiding this comment.
pushGateway stored as concrete type : GoFr's convention is "take interfaces, return concrete types" — the same PR correctly introduces meterProviderShutdowner as an interface for meterProvider. Please apply the same pattern:
type metricsFlusher interface {
Push(context.Context) error
}
pushGateway metricsFlusher
Without this, Container.Close()'s pushgateway path can only be unit-tested with a real HTTP endpoint (or httptest.Server), not with a simple mock.
| if url := app.Config.Get("METRICS_PUSH_GATEWAY_URL"); url != "" { | ||
| jobName := app.Config.GetOrDefault("APP_NAME", "gofr-app") | ||
| app.container.SetPushGateway(exporters.NewPushGateway(url, jobName, app.container.Logger)) | ||
| } | ||
|
|
There was a problem hiding this comment.
This code has 0 test coverage. Lets add test for them.
| shutdownCtx, cancel := context.WithTimeout(context.Background(), shutDownTimeout) | ||
| defer cancel() | ||
|
|
||
| if err := a.Shutdown(shutdownCtx); err != nil { |
There was a problem hiding this comment.
a.Shutdown() is called sequentially after a.cmd.Run(). If the handler panics, the stack unwinds and Shutdown() is never reached — metrics are not pushed and no container cleanup happens.
Maybe we can defere shutdown?
shutdownCtx, cancel := context.WithTimeout(context.Background(), shutDownTimeout)
defer cancel()
defer func() {
if err := a.Shutdown(shutdownCtx); err != nil {
a.Logger().Errorf("CLI shutdown error: %v", err)
}
}()
a.cmd.Run(a.container)| // PushGateway pushes metrics from the default Prometheus registry to a Pushgateway. | ||
| type PushGateway struct { | ||
| pusher *push.Pusher | ||
| logger logger |
There was a problem hiding this comment.
Why can't we use logging.Logger diretcly here? What is the need of new logger interface?
|
Regarding Comment 1 (Delete support / The Pushgateway documentation explicitly states that the Pushgateway is designed as a metric cache — the standard recommendation is to not delete pushed metrics, and instead use If you push and immediately delete, Prometheus may not have scraped yet (typical scrape interval is 15–30s), and the metrics are lost forever. There's no reliable way for the CLI to know whether Prometheus has completed its scrape before issuing a delete. For users who need cleanup of stale metrics, this is best handled at the Pushgateway operational level (e.g., Pushgateway's own This can always be revisited in a follow-up if users explicitly request it, but for v1 the "push and leave" approach is the correct and safe default. |
|
Regarding Comment 5 (Shutdown order — move metricServer.Shutdown before container.Close): The current shutdown order is actually correct: The For the Pushgateway path specifically, the push happens inside |
|
Regarding Comment 8 (Factory.go test coverage): The new pushgateway wiring in |
Summary
CLI applications are short-lived — they exit before Prometheus can scrape
/metrics. This PR adds push-based metrics export via Prometheus Pushgateway for GoFr CLI apps, along with automatic CLI command metrics.Closes #2232
What's included
pkg/gofr/metrics/exporters/pushgateway.go): Wrapspush.Pusherwithprometheus.DefaultGathererto push all collected metrics on shutdownexporters.Prometheus()now returns bothMeterandMeterProvider): Ensures buffered metrics are flushed onContainer.Close()cmd.gofollowing the existing cron metrics pattern:app_cmd_duration_seconds(histogram)app_cmd_success_total(counter)app_cmd_errors_total(counter)run.go: CallsShutdown()aftercmd.Run()to flush metrics and close resourcesMETRICS_PUSH_GATEWAY_URLenv var to enable (CLI only, not HTTP apps)sample-cmdexample: pushgateway + prometheus + grafanahttp-serverexample (duration p95, success rate, error rate)Design decisions
NewCMD()only — HTTP apps continue using pull-based scrapingContainerowns the pushgateway and flushes onClose(), keepingcmdstruct clean*exporters.PushGatewaytype directlyprometheus.DefaultGathererwhich reads from the same default registry the OTel Prometheus exporter writes toTest plan
go build ./...compilesgo test ./pkg/gofr/... ./pkg/gofr/metrics/... ./pkg/gofr/container/...passesgolangci-lint runclean (no new issues)cd examples/sample-cmd/docker && docker-compose up— verify pushgateway receivesapp_cmd_*metricslocalhost:3000for new CMD panels