Skip to content

Conversation

@pznamensky
Copy link
Contributor

Hi @maxlaverse,
We use soft-pod-memory-evicter and it works fine for us. Thank you for it!
The only thing we've found useful is to add evicted_pods_total metric to get alerts if something goes crazy with our apps.
This is not something anybody asked (according to the issues in the repo).
The patchset also has been almost completely generated by a LLM (it has been reviewed by a human though) and it has been working for a while in our production environment.
If you think it worth merging, can you please take a look at the PR, please?

@maxlaverse maxlaverse self-requested a review November 18, 2025 18:23
Copy link
Owner

@maxlaverse maxlaverse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @pznamensky!
Thanks for your contribution. Having a metric is a good addition.
I left a few comments because I think we can really simplify the implementation.

return noopMetricsRecorder{}
}

if strings.TrimSpace(opts.MetricsBindAddress) == "" {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about doing value validation directly in the urfave/cli in main.go with something like:

    Before: func(ctx *cli.Context) error {
        if opts.EnableMetrics {
            bindAddr := strings.TrimSpace(opts.MetricsBindAddress)
            if bindAddr == "" {
                return fmt.Errorf("--metrics-bind-address cannot be empty when --enable-metrics is true")
            }
            if _, _, err := net.SplitHostPort(bindAddr); err != nil {
                return fmt.Errorf("invalid --metrics-bind-address: %w", err)
            }
        }
        return nil
    },

}

func (r *prometheusMetricsRecorder) Start(ctx context.Context) {
if r.server == nil {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

r.server == nil will only happens if someone made a buggy change. Since this is all internal and a very small project, I think I would just not have this check. It's unlikely to be a problem, and should probably better crash hard right away.

shutdownCtx, cancel := context.WithTimeout(context.Background(), metricsShutdownGrace)
defer cancel()

if err := r.server.Shutdown(shutdownCtx); err != nil && !errors.Is(err, http.ErrServerClosed) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it really possible for this method to return a http.ErrServerClosed error ?

recorder.server = &http.Server{
Addr: opts.MetricsBindAddress,
Handler: mux,
ReadHeaderTimeout: 5 * time.Second,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could make sense to have this slightly lower than the shutdown timeout I think, to rather have the client get a timeout error instead of seeing our server shutdown logging errors.

Maybe worth bringing it on top as a constant to have both side by side

}

func (r *prometheusMetricsRecorder) RecordPodEviction(namespace, appName, appInstance string) {
if r.counter == nil {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as for server, I wouldn't guard here and just keep the code simple. Same below

r.counter.EnsureMetric(namespace, appName, appInstance)
}

func (r *prometheusMetricsRecorder) handleMetrics(w http.ResponseWriter, req *http.Request) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't know if you know, but Prometheus provides a promhttp.Handler() which handles all this already.

}
}

func (c *evictedPodCounter) Inc(namespace, appName, appInstance string) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless there is a specific need for managing this by hand, have a look at this guide: https://prometheus.io/docs/guides/go-application/

For classic counters, there is a much easier way to achieve this. You could remove the noopRecorder even, and just start the metric server conditionally.

}

func appNameLabel(pod *corev1.Pod) string {
if pod == nil {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't guard again pod being nil. If pod is nil, the code would crash where this method is called already. Since this is only used for metrics, maybe we could both the two methods in metrics_exporter.go and just pass to pod to RecordPodEviction ?

r.counter.Inc(namespace, appName, appInstance)
}

func (r *prometheusMetricsRecorder) RecordObservedNamespace(namespace, appName, appInstance string) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not really just about namespace right ? It's to initialize metrics for a given Pod to 0 ?

If so, is it really needed (haven't done Prometheus queries in a while). In a cluster with a thousands Pods, you'll have a thousand time series even if only 1-2 Pods are evicted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's correct, but not exactly.
What we needed is to indeed initialize a metric with 0.
However, as you correctly pointed out, it would be too many series if we make it per pod.
Initially I just made this per namespace, but then we realize that's not enough for us.
What I came up with is adding these common pod labels (see also: link):

  • app.kubernetes.io/name
  • app.kubernetes.io/instance

That's not an ideal solution, as these labels are not defined in all pods, but for most cases it seems works.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about renaming it it RecordPodExistence then

@pznamensky
Copy link
Contributor Author

@maxlaverse, thanks for your review.
I've made requested changes. Could please take another look?

Copy link
Owner

@maxlaverse maxlaverse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides two remarks about naming it looks good to me. Thanks for the taking the time.
I'll try it out.

metricsShutdownGrace = 5 * time.Second
metricsReadHeaderTimeout = 4 * time.Second

metricLabelNamespace = "affected_namespace"
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I would remove the "affected_" prefix from all those tags, because it's a bit weird when the metric's value is 0 (it isn't really affected by the controller), and the metric will always only record Pods being affected. wdyt ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I added the affected_ prefix is that in default setup Prometheus adds app_kubernetes_io_instance and app_kubernetes_io_name. So we need to distinguish between them somehow.

For example:

soft_pod_memory_evicter_evicted_pods_total{affected_app_kubernetes_io_instance="api", affected_app_kubernetes_io_name="api", affected_namespace="backend", app_kubernetes_io_instance="soft-pod-memory-evicter", app_kubernetes_io_name="soft-pod-memory-evicter", environment="production", instance="192.168.194.80:9288", job="kubernetes-pods", namespace="soft-pod-memory-evicter", node="ip-192-168-250-40.eu-west-1.compute.internal", pod="soft-pod-memory-evicter-5f85645ff7-q9wq6", pod_template_hash="5f85645ff7"}

It might make sense to rename prefix to something like target_ to avoid confusion. How does this sound to you?

r.counter.Inc(namespace, appName, appInstance)
}

func (r *prometheusMetricsRecorder) RecordObservedNamespace(namespace, appName, appInstance string) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about renaming it it RecordPodExistence then

@pznamensky
Copy link
Contributor Author

As for RecordObservedNamespace -> RecordPodExistence renaming:
The method just pre-creates the eviction counter series for a pod’s namespace/name/instance labels so that Prometheus exposes a zero-valued line even if no evictions occur (yet).
Would EnsureEvictionCounterSeries better express the purpose?

@maxlaverse
Copy link
Owner

Thanks again for your contribution @pznamensky ! I'm going to merge the PR and eventually adjust it in subsequent commits

@maxlaverse maxlaverse merged commit 5a2dcba into maxlaverse:main Jan 4, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants