Skip to content

Commit b49a0e1

Browse files
committed
Add info function blog post
Signed-off-by: Arve Knudsen <[email protected]>
1 parent 9e02fa1 commit b49a0e1

File tree

1 file changed

+298
-0
lines changed

1 file changed

+298
-0
lines changed
Lines changed: 298 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,298 @@
1+
---
2+
title: Introducing the Experimental info() Function
3+
created_at: 2025-11-14
4+
kind: article
5+
author_name: Arve Knudsen
6+
---
7+
8+
Enriching metrics with metadata labels can be surprisingly tricky in Prometheus, even if you're a PromQL wiz!
9+
Traditionally, complex PromQL join syntax is required in Prometheus to add even basic information like Kubernetes cluster names or cloud provider regions to queries.
10+
The new, still experimental `info()` function, promises a simpler way, making label enrichment as simple as wrapping your query in a single function call.
11+
12+
In Prometheus 3.0, we introduced the [`info()`](https://prometheus.io/docs/prometheus/latest/querying/functions/#info) function, a powerful new way to enrich your time series with labels from info metrics.
13+
`info` doesn't only suffer simpler syntax however.
14+
It also solves a subtle yet critical problem that has plagued join queries for years: The "churn problem" that causes queries to fail when "non-identifying" info metric labels change.
15+
Identifying labels here in practice means those that are joined on.
16+
17+
Whether you're working with OpenTelemetry resource attributes, Kubernetes labels, or any other metadata, the `info()` function makes your PromQL queries cleaner, more reliable, and easier to understand.
18+
19+
<!-- more -->
20+
21+
## The Problem: Complex Joins and The Churn Problem
22+
23+
Let us start by looking at what we have had to do until now.
24+
Imagine you're monitoring HTTP request durations via OpenTelemetry and want to break them down by Kubernetes cluster.
25+
Your metrics have `job` and `instance` labels, but the cluster name lives in a separate `target_info` metric, as the `k8s_cluster_name` label.
26+
Here's what the traditional approach looks like:
27+
28+
```promql
29+
sum by (k8s_cluster_name, http_status_code) (
30+
rate(http_server_request_duration_seconds_count[2m])
31+
* on (job, instance) group_left (k8s_cluster_name)
32+
target_info
33+
)
34+
```
35+
36+
While this works, there are several issues:
37+
38+
**1. Complexity:** You need to know:
39+
- Which info metric contains your labels (`target_info`)
40+
- Which labels are the "identifying" labels to join on (`job`, `instance`)
41+
- Which data labels you want to add (`k8s_cluster_name`)
42+
- The proper PromQL join syntax (`on`, `group_left`)
43+
44+
This requires expert-level PromQL knowledge and makes queries harder to read and maintain.
45+
46+
**2. The Churn Problem (The Critical Issue):**
47+
48+
Here's the subtle but serious problem: What happens when a Kubernetes pod gets recreated?
49+
The `k8s_pod_name` label in `target_info` changes, and Prometheus sees this as a completely new time series.
50+
51+
If the old `target_info` series isn't properly marked as stale immediately, both the old and new series can exist simultaneously for up to 5 minutes (the default lookback delta).
52+
During this overlap period, your join query finds **two distinct matching `target_info` time series** and fails with a "many-to-many matching" error.
53+
54+
This could in practice mean your dashboards break and your alerts stop firing when infrastructure changes are happening, perhaps precisely when you would need visibility the most.
55+
56+
## The Solution: Simple, Reliable Label Enrichment
57+
58+
The `info()` function solves both problems at once.
59+
Here's the same query using `info()`:
60+
61+
```promql
62+
sum by (k8s_cluster_name, http_status_code) (
63+
info(
64+
rate(http_server_request_duration_seconds_count[2m]),
65+
{k8s_cluster_name=~".+"}
66+
)
67+
)
68+
```
69+
70+
Much more comprehensible, no?
71+
The real magic happens under the hood though: **`info()` automatically selects the time series with the latest sample**, eliminating churn-related join failures entirely.
72+
73+
### Basic Syntax
74+
75+
```promql
76+
info(v instant-vector, [data-label-selector instant-vector])
77+
```
78+
79+
- **`v`**: The instant vector to enrich with metadata labels
80+
- **`data-label-selector`** (optional): Label matchers in curly braces to filter which labels to include
81+
82+
If you omit the second parameter, `info()` adds **all** data labels from `target_info`:
83+
84+
```promql
85+
info(rate(http_server_request_duration_seconds_count[2m]))
86+
```
87+
88+
### Selecting Different Info Metrics
89+
90+
By default, `info()` uses the `target_info` metric.
91+
However, you can select different info metrics (like `build_info`, `node_uname_info`, or `kube_pod_labels`) by including a `__name__` matcher in the data-label-selector:
92+
93+
```promql
94+
# Use build_info instead of target_info
95+
info(up, {__name__="build_info"})
96+
97+
# Use multiple info metrics (combines labels from both)
98+
info(up, {__name__=~"(target|build)_info"})
99+
100+
# Select build_info and only include the version label
101+
info(up, {__name__="build_info", version=~".+"})
102+
```
103+
104+
**Note:** The current implementation always uses `job` and `instance` as the identifying labels for joining, regardless of which info metric you select.
105+
This works well for most standard info metrics but may have limitations with custom info metrics that use different identifying labels.
106+
107+
## Real-World Use Cases
108+
109+
### OpenTelemetry Integration
110+
111+
The primary driver for the `info()` function is [OpenTelemetry](https://prometheus.io/blog/2024/03/14/commitment-to-opentelemetry/) (OTel) integration.
112+
When using Prometheus as an OTel backend, resource attributes (metadata about the metrics producer) are automatically converted to the `target_info` metric:
113+
114+
- `service.instance.id``instance` label
115+
- `service.name``job` label
116+
- `service.namespace` → prefixed to `job` (i.e., `<namespace>/<service.name>`)
117+
- All other resource attributes → data labels on `target_info`
118+
119+
This means that, so long as at least either the `service.instance.id` or the `service.name` resource attribute is included, every OTel metric you send to Prometheus over OTLP can be enriched with resource attributes using `info()`:
120+
121+
```promql
122+
# Add all OTel resource attributes
123+
info(rate(http_server_request_duration_seconds_sum[5m]))
124+
125+
# Add only specific attributes
126+
info(
127+
rate(http_server_request_duration_seconds_sum[5m]),
128+
{k8s_cluster_name=~".+", k8s_namespace_name=~".+", k8s_pod_name=~".+"}
129+
)
130+
```
131+
132+
### Kubernetes Metadata
133+
134+
Enrich your metrics with Kubernetes-specific information:
135+
136+
```promql
137+
# Add cluster and namespace information to request rates
138+
info(
139+
sum by (job, http_status_code) (
140+
rate(http_server_request_duration_seconds_count[2m])
141+
),
142+
{k8s_cluster_name=~".+", k8s_namespace_name=~".+"}
143+
)
144+
```
145+
146+
### Cloud Provider Metadata
147+
148+
Add cloud provider information to understand costs and performance by region:
149+
150+
```promql
151+
# Enrich with AWS/GCP/Azure region and availability zone
152+
info(
153+
rate(cloud_storage_request_count[5m]),
154+
{cloud_provider=~".+", cloud_region=~".+", cloud_availability_zone=~".+"}
155+
)
156+
```
157+
158+
## Before and After: Side-by-Side Comparison
159+
160+
Let's see how the `info()` function simplifies real queries:
161+
162+
### Example 1: Basic Label Enrichment
163+
164+
**Traditional approach:**
165+
```promql
166+
rate(http_server_request_duration_seconds_count[2m])
167+
* on (job, instance) group_left (k8s_cluster_name)
168+
target_info
169+
```
170+
171+
**With info():**
172+
```promql
173+
info(
174+
rate(http_server_request_duration_seconds_count[2m]),
175+
{k8s_cluster_name=~".+"}
176+
)
177+
```
178+
179+
### Example 2: Aggregation with Multiple Labels
180+
181+
**Traditional approach:**
182+
```promql
183+
sum by (k8s_cluster_name, k8s_namespace_name, http_status_code) (
184+
rate(http_server_request_duration_seconds_count[2m])
185+
* on (job, instance) group_left (k8s_cluster_name, k8s_namespace_name)
186+
target_info
187+
)
188+
```
189+
190+
**With info():**
191+
```promql
192+
sum by (k8s_cluster_name, k8s_namespace_name, http_status_code) (
193+
info(
194+
rate(http_server_request_duration_seconds_count[2m]),
195+
{k8s_cluster_name=~".+", k8s_namespace_name=~".+"}
196+
)
197+
)
198+
```
199+
200+
The intent is much clearer with `info`: We're enriching `http_server_request_duration_seconds_count` with cluster and namespace information, then aggregating by those labels and `http_status_code`.
201+
202+
## Technical Benefits
203+
204+
Beyond cleaner syntax, the `info()` function provides several technical advantages:
205+
206+
### 1. Automatic Churn Handling
207+
208+
As previously mentioned, `info()` automatically picks the matching info time series with the latest sample when multiple versions exist.
209+
This eliminates the "many-to-many matching" errors that plague traditional join queries during churn.
210+
211+
**How it works:** When non-identifying info metric labels change (e.g., a pod is re-created), there's a brief period where both old and new series might exist.
212+
The `info()` function simply selects whichever has the most recent sample, ensuring your queries keep working.
213+
214+
### 2. Better Performance
215+
216+
The `info()` function is more efficient than traditional joins:
217+
- Only selects matching info series
218+
- Avoids unnecessary label matching operations
219+
- Optimized query execution path
220+
221+
## Getting Started
222+
223+
The `info()` function is experimental and must be enabled via a feature flag:
224+
225+
```bash
226+
prometheus --enable-feature=promql-experimental-functions
227+
```
228+
229+
Once enabled, you can start using it immediately.
230+
Here are some simple examples to try:
231+
232+
```promql
233+
# Basic usage - add all target_info labels
234+
info(up)
235+
236+
# Selective enrichment - add only cluster name
237+
info(up, {k8s_cluster_name=~".+"})
238+
239+
# In a real query
240+
info(
241+
rate(http_server_request_duration_seconds_count[5m]),
242+
{k8s_cluster_name=~".+"}
243+
)
244+
245+
# With aggregation
246+
sum by (k8s_cluster_name) (
247+
info(up, {k8s_cluster_name=~".+"})
248+
)
249+
```
250+
251+
## Current Limitations and Future Plans
252+
253+
The current implementation is an **MVP (Minimum Viable Product)** designed to validate the approach and gather user feedback.
254+
It has some intentional limitations:
255+
256+
### Current Constraints
257+
258+
1. **Default info metric:** Only considers `target_info` by default
259+
- Workaround: You can use `__name__` matchers like `{__name__=~"(target|build)_info"}` in the data-label-selector, though this still assumes `job` and `instance` as identifying labels
260+
261+
2. **Fixed identifying labels:** Always assumes `job` and `instance` are the identifying labels for joining
262+
- This works for most use cases but may not be suitable for all scenarios
263+
264+
### Future Development
265+
266+
These limitations are meant to be temporary.
267+
The experimental status allows us to:
268+
- Gather real-world usage feedback
269+
- Understand which use cases matter the most
270+
- Iterate on the design before committing to a final API
271+
272+
A future version of the `info()` function should:
273+
- Support all info metrics (not just `target_info`)
274+
- Dynamically determine identifying labels based on the info metric's structure
275+
276+
**Important:** Because this is an experimental feature, the behavior may change in future Prometheus versions, or the function could potentially be removed from PromQL entirely based on user feedback.
277+
278+
## Conclusion
279+
280+
The experimental `info()` function represents a significant step forward in making PromQL more accessible and reliable.
281+
By simplifying metadata label enrichment and automatically handling the churn problem, it removes two major pain points for Prometheus users, especially those adopting OpenTelemetry.
282+
283+
We encourage you to try the `info()` function and share your feedback:
284+
- What use cases does it solve for you?
285+
- What additional functionality would you like to see?
286+
- How could the API be improved?
287+
- Do you see improved performance?
288+
289+
Your feedback will directly shape the future of this feature and help us determine whether it should become a permanent part of PromQL.
290+
291+
To learn more:
292+
- [PromQL functions documentation](https://prometheus.io/docs/prometheus/latest/querying/functions/#info)
293+
- [OpenTelemetry guide (includes detailed info() usage)](https://prometheus.io/docs/guides/opentelemetry/)
294+
- [Feature proposal](https://github.com/prometheus/proposals/blob/main/proposals/0037-native-support-for-info-metrics-metadata.md)
295+
296+
Please feel welcome to share your thoughts with the Prometheus community on [GitHub Discussions](https://github.com/prometheus/prometheus/discussions) or get in touch with us on the [CNCF Slack #prometheus channel](https://cloud-native.slack.com/).
297+
298+
Happy querying!

0 commit comments

Comments
 (0)