metrics-generator: send stale markers via remote write when series are removed from registry

**Is your feature request related to a problem? Please describe.**

When the metrics-generator's `registry.stale_duration` expires for a series, the series is deleted from the internal registry and stops being emitted. However, no [stale marker](https://prometheus.io/docs/specs/prw/remote_write_spec/#stale-markers) (the special NaN value `0x7ff0000000000002`) is sent to the remote write target.

This means Prometheus has no way to know the series is gone. It continues returning the last written sample for up to 5 minutes (the hardcoded lookback delta), producing misleading query results. For example, if a service processes one request and then stops, `traces_spanmetrics_calls_total{service="my-service"}` continues to return `1` for 5 minutes after the last span was received.

This violates the [Prometheus Remote-Write 1.0 specification](https://prometheus.io/docs/specs/prw/remote_write_spec/), which states:

> Senders MUST send stale markers when a time series will no longer be appended to.

Using `rate()` mitigates the issue for counters, but gauges and instant queries remain affected. Dashboards and alerting rules that rely on the presence/absence of a series (e.g., `absent()`, `up`-style checks) are also impacted.

**Environment:**
- Tempo 2.9.0, single-binary mode on Kubernetes (Helm chart 1.24.4)
- Prometheus via kube-prometheus-stack, remote write receiver enabled
- `registry.collection_interval: 1s`, `registry.stale_duration: 3s`

**Describe the solution you'd like**

When a series is deleted from the registry after `stale_duration`, the metrics-generator should emit one final remote write request containing a stale marker for that series. This would allow Prometheus (and any remote-write-compatible TSDB) to immediately mark the series as stale, matching the behavior of scrape-based ingestion.

The implementation would likely be in the registry's collection loop (`modules/generator/registry`), at the point where stale series are pruned. Before removing a series, write a sample with the stale NaN value and include it in the next remote write batch.

**Describe alternatives you've considered**
- **Relying on `rate()`/`increase()` for all queries**: Works for counters but not for gauges or presence-based alerting.
- **Shorter `stale_duration`**: Reduces the window but doesn't eliminate it — Prometheus still shows the last value for up to 5 minutes.
- **Reducing Prometheus lookback delta**: Not configurable per-series; changing it globally affects all queries and can break scrape-based series with irregular intervals.

**Additional context**

- Related: #1303 (metrics-generator production readiness) mentions stale series cleanup but focuses on registry memory, not remote write behavior.
- The Prometheus remote write spec explicitly requires stale markers: https://prometheus.io/docs/specs/prw/remote_write_spec/#stale-markers
- Happy to contribute an implementation if the maintainers agree with the approach.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

metrics-generator: send stale markers via remote write when series are removed from registry #6494

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

metrics-generator: send stale markers via remote write when series are removed from registry #6494

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions