Skip to content

Conversation

@aknuds1
Copy link
Contributor

@aknuds1 aknuds1 commented Nov 14, 2025

I'm proposing to add a blog post on the PromQL info function! During the last PromCon, a revelation was that the info function is too poorly marketed. One potential remediation mentioned was writing a blog post :)

Signed-off-by: Arve Knudsen <[email protected]>
@aknuds1 aknuds1 changed the title Add info function blog post Add info function blog post Nov 14, 2025
@aknuds1 aknuds1 changed the title Add info function blog post Add PromQL info function blog post Nov 14, 2025
Copy link
Member

@ywwg ywwg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great! I made a few comments

Much more comprehensible, no?
The real magic happens under the hood though: **`info()` automatically selects the time series with the latest sample**, eliminating churn-related join failures entirely.

### Basic Syntax
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know Patrick ran into issues where info() had to be put in very specific places, especially when a rate function was involved. I think his intuition was to "wrap the metric name in info()" but that doesn't work when there's a rate function? So I think we should have an aside with some advice on how to know where the info function will need to go.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree in general, but the rationale is not that info is special or needlessly complicated here, but the opposite: info is just a normal function. Nothing new here. You place it where you place any other function that receives an instant vector. (You could call that "very specific places". I would call it "just the normal expected places".) For example, label_replace is placed at exactly the same places where you would use info. (Buth functions not only take an instant vector as an argument, they also share the property that they just manipulate labels, not values. All perfectly normal and expected – I might even use the infamous word "intuitive" for that.)

Patrick's issue has to do with PromQL in general, not info in particular. Obviously, the intuition of somebody not familiar with PromQL is very different from the intuition of somebody who knows PromQL. Catering for the one might very well be counterproductive for the other.

So we should argue here that info is just a normal function acting on any instant vector, not any new concept like a "decorator" or something.

Comment on lines +216 to +219
The `info()` function is more efficient than traditional joins:
- Only selects matching info series
- Avoids unnecessary label matching operations
- Optimized query execution path
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know some people were suspicious about the performance of the target_info concept ... Do we have numbers comparing the performance of label promotion vs info() joining? If it's small enough, that may help discourage people from promoting everything by default.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know about perf comparisons versus label promotion. I just know there's an obvious performance benefit over join queries, due to being able to filter what to select on the RHS :)


## Current Limitations and Future Plans

The current implementation is an **MVP (Minimum Viable Product)** designed to validate the approach and gather user feedback.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a link to directly file an issue? "If you experience any issues with info() please report them here" etc

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, PTAL.

The experimental `info()` function represents a significant step forward in making PromQL more accessible and reliable.
By simplifying metadata label enrichment and automatically handling the churn problem, it removes two major pain points for Prometheus users, especially those adopting OpenTelemetry.

We encourage you to try the `info()` function and share your feedback:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

link again

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, PTAL.

aknuds1 and others added 2 commits November 17, 2025 17:30
Co-authored-by: Owen Williams <[email protected]>
Signed-off-by: Arve Knudsen <[email protected]>
Co-authored-by: Owen Williams <[email protected]>
Signed-off-by: Arve Knudsen <[email protected]>
In Prometheus 3.0, we introduced the [`info()`](https://prometheus.io/docs/prometheus/latest/querying/functions/#info) function, a powerful new way to enrich your time series with labels from info metrics.
`info` doesn't only offer a simpler syntax however.
It also solves a subtle yet critical problem that has plagued join queries for years: The "churn problem" that causes queries to fail when "non-identifying" info metric labels change.
In practice, "identifying labels" refers to those labels that the join is performed on.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should expand a section laying out what "identifying" really means, compared to "data labels" or "nonidentifying labels", referencing the otel data model (which also uses the word "descriptive"!)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want to write a section and make a PR to my branch or something? Maybe the easiest.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, can do

Copy link
Member

@beorn7 beorn7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the avalanche of comments. Here are my main lines of thought (from which most comments derive):

  • We should really focus on the cases where info is much more simple than the traditional query:
    1. All data labels are copied. (You have covered this case, but always further down; it should be the very first case to show, though.)
    2. Filtering for label value. (You haven't covered this case at all.)
  • Rather than reinforcing the misconception that info is merely solving a syntactic problem (which could be read as info being merely syntactic sugar), we should clarify that the problem of the complex join queries is caused by the amount of information the user has to specify and that info solves this by deducing this information from other sources (or just using heuristics for now, which also explains the shortcomings of the current implementation).
  • We should be very upfront about the missing parts, both that they are relevant but also that we want to work on them. That's partially already the case, but as explained in comments, it could be clearer. What we should also add is that the tooling outside of PromQL proper doesn't really handle the info function appropriately yet, e.g. autocomplete or query builders. (Many issues Patrick reported were about exactly those deficiencies. But for him, it appeared as fundamental problems of info and not as an incomplete implementation.)

---

Enriching metrics with metadata labels can be surprisingly tricky in Prometheus, even if you're a PromQL wiz!
Traditionally, complex PromQL join syntax is required in Prometheus to add even basic information like Kubernetes cluster names or cloud provider regions to queries.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds like it's just a problem of syntax. This blog post might be an opportunity to debunk this naive assumption that is in everyone's head.
It could also be a good segue into the parts of the info function that still need to be finished before it can unleash its true power.

To reiterate the reasons for the complexity of the join query:

  • You have to specify on which labels to join (segue into what "identifying labels" means, see @ywwg's comment, and that we want to store this information in the future).
  • You have to specify the info metric to join with (segue into storing the type information in the future to know what info metrics exist).
  • You have to specify which "data labels" to copy over from the info metric.

Of course, you cannot explain all of this in the opening paragraph, but maybe a bit of it, something like:

Suggested change
Traditionally, complex PromQL join syntax is required in Prometheus to add even basic information like Kubernetes cluster names or cloud provider regions to queries.
The PromQL join query traditionally used for this is inherently quite complex because it has to specify the labels to join on, the info metric to join with, and the labels to enrich with.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PTAL.

The new, still experimental `info()` function, promises a simpler way, making label enrichment as simple as wrapping your query in a single function call.

In Prometheus 3.0, we introduced the [`info()`](https://prometheus.io/docs/prometheus/latest/querying/functions/#info) function, a powerful new way to enrich your time series with labels from info metrics.
`info` doesn't only offer a simpler syntax however.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, I don't think the fundamental nature of the problem is syntactical.

info is simpler because you do not have to specify the identifying labels, you do not have to specify the info metric to join with, and if you just want to enrich with all data labels, you do not even have to specify the data labels you want to enrich with.

I think each of the three features is more important (both in practical terms and as a key insight how info works and why it is useful) than the next one (the "churn problem"), among others because the churn problem only occurs with broken staleness handling (which could be fixed in OTLP ingestion), and it should actually never occur if OTel folks just used resource attributes as they are meant to be used. (Of course, we don't need to explain those subtleties here, I just want to put things into perspective.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revised, PTAL.

**2. The Churn Problem (The Critical Issue):**

Here's the subtle but serious problem: What happens when a Kubernetes pod gets recreated?
The `k8s_pod_name` label in `target_info` changes, and Prometheus sees this as a completely new time series.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that in this situation, also the instance label will change (and has to change according to OTel conventions), so this problem shouldn't occur.

Maybe we can still keep using this example, but we should then also mention that this is a broken setup (where instance doesn't change upon pod recreation).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revised, PTAL.

Here's the subtle but serious problem: What happens when a Kubernetes pod gets recreated?
The `k8s_pod_name` label in `target_info` changes, and Prometheus sees this as a completely new time series.

If the old `target_info` series isn't properly marked as stale immediately, both the old and new series can exist simultaneously for up to 5 minutes (the default lookback delta).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe don't phrase this as an "if". It will never be marked properly as stale because of the way OTLP ingestion works. So on the one hand, we can clearly say that the staleness handling doesn't work right now, but we could also mention that this is an issue that could be fixed in the OTLP ingestion layer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revised, PTAL.

Much more comprehensible, no?
The real magic happens under the hood though: **`info()` automatically selects the time series with the latest sample**, eliminating churn-related join failures entirely.

### Basic Syntax
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree in general, but the rationale is not that info is special or needlessly complicated here, but the opposite: info is just a normal function. Nothing new here. You place it where you place any other function that receives an instant vector. (You could call that "very specific places". I would call it "just the normal expected places".) For example, label_replace is placed at exactly the same places where you would use info. (Buth functions not only take an instant vector as an argument, they also share the property that they just manipulate labels, not values. All perfectly normal and expected – I might even use the infamous word "intuitive" for that.)

Patrick's issue has to do with PromQL in general, not info in particular. Obviously, the intuition of somebody not familiar with PromQL is very different from the intuition of somebody who knows PromQL. Catering for the one might very well be counterproductive for the other.

So we should argue here that info is just a normal function acting on any instant vector, not any new concept like a "decorator" or something.


## Technical Benefits

Beyond cleaner syntax, the `info()` function provides several technical advantages:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not just syntax, as said. info is utilizing information that has otherwise to be provided by the user. Sorry for hammering this in so much, but I think this is very crucial to understand why info is so cool (and also what is still missing in its implementation).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revised, PTAL.

Comment on lines 230 to 249
Here are some simple examples to try:

```promql
# Basic usage - add all target_info labels
info(up)

# Selective enrichment - add only cluster name
info(up, {k8s_cluster_name=~".+"})

# In a real query
info(
rate(http_server_request_duration_seconds_count[5m]),
{k8s_cluster_name=~".+"}
)

# With aggregation
sum by (k8s_cluster_name) (
info(up, {k8s_cluster_name=~".+"})
)
```
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if there is value to repeat examples already given above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed them, PTAL.

- Workaround: You can use `__name__` matchers like `{__name__=~"(target|build)_info"}` in the data-label-selector, though this still assumes `job` and `instance` as identifying labels

2. **Fixed identifying labels:** Always assumes `job` and `instance` are the identifying labels for joining
- This works for most use cases but may not be suitable for all scenarios
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this over-promises. There are common use cases already where this doesn't work. You even dropped kube_pod_labels above. We should clearly say that this is a real problem but also that we want to solve this in the future (by storing the information which labels are identifying).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revised, PTAL.


A future version of the `info()` function should:
- Support all info metrics (not just `target_info`)
- Dynamically determine identifying labels based on the info metric's structure
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not so much dynamic, but it's about (statically) storing the information what labels are identifying.

Note for future implementations (not related to the review, just happened to come to my mind while doing this review): For something like kube_pod_labels, we'll have the actual identifying labels (namespace, pod), then we'll have the data labels (in this case the actual pod labels), but when these metrics are ingested from KSM, we will also get a job and instance label attached (and whatever other target labels are configured). The future perfect version of info probably should just join on namespace and pod and only add the pod labels, but completely ignore job and instance and possibly other target labels (i.e. don't join on them, but also don't add them to the result either). Devil is in the detail here…

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revised, PTAL.

@aknuds1 aknuds1 force-pushed the arve/info-blog-post branch from bdab42e to 2f68dcb Compare November 27, 2025 09:58
Signed-off-by: Arve Knudsen <[email protected]>
Signed-off-by: Arve Knudsen <[email protected]>
Signed-off-by: Arve Knudsen <[email protected]>
Signed-off-by: Arve Knudsen <[email protected]>
Signed-off-by: Arve Knudsen <[email protected]>
@aknuds1 aknuds1 requested a review from beorn7 November 27, 2025 16:02
Signed-off-by: Arve Knudsen <[email protected]>
@aknuds1 aknuds1 requested a review from ywwg November 27, 2025 16:08
@beorn7
Copy link
Member

beorn7 commented Nov 27, 2025

Will have a look ASAP. (I'm saying that to many these days… it might not be very soon…)

@aknuds1
Copy link
Contributor Author

aknuds1 commented Nov 27, 2025

Thanks @beorn7!

Copy link
Member

@beorn7 beorn7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few more things, but looking very good already.

Note that the date in the filename needs to be updated to the actual date of publication.

During this overlap period, your join query finds **two distinct matching `target_info` time series** and fails with a "many-to-many matching" error.

This could in practice mean your dashboards break and your alerts stop firing when infrastructure changes are happening, perhaps precisely when you would need visibility the most.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something is missing here. A headline like "The Solution in its simplest form" or something?

Currently, this appears as just another paragraph in the section describing the 2nd part of the problem…

Copy link
Contributor Author

@aknuds1 aknuds1 Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. On reviewing the section, I realized that a sub-section was missing. I added it, PTAL.

info(up, {__name__="build_info", version=~".+"})
```

**Note:** The current implementation always uses `job` and `instance` as the identifying labels for joining, regardless of which info metric you select.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I would love that.


**Traditional approach:**
```promql
sum by (http_status_code, k8s_cluster_name, k8s_container_name, k8s_cronjob_name, k8s_daemonset_name, k8s_deployment_name, k8s_job_name, k8s_namespace_name, k8s_pod_name, k8s_replicaset_name, k8s_statefulset_name) (
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a realistic set of labels to sum by? I don't know the semantic conversions by heart, but isn't k8s_pod_name essentially unique per pod? There wouldn't be much to sum up in this way… Maybe sum by fewer labels, something that is more likely to happen in a real query.

(Mostly I want to prevent that readers just dismiss the query as contrived.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I revised to what might be a more realistic set of labels. PTAL.

@aknuds1 aknuds1 requested a review from beorn7 December 2, 2025 19:21
Copy link
Member

@ywwg ywwg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make "Giving Feedback" a heading so it's more eyecatching? Right now it's buried at the end of the post -- make it really easy for people who want to talk back to do so.

info(up, {__name__="build_info", version=~".+"})
```

**Note:** The current implementation always uses `job` and `instance` as the identifying labels for joining, regardless of which info metric you select.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"current implementation" implies that this may change, do we have any plans to? Maybe we can call out that we are looking for feedback on this point

Copy link
Contributor Author

@aknuds1 aknuds1 Dec 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"current implementation" implies that this may change, do we have any plans to?

Yes the plan is relatively clear, the last sentence of the same paragraph already explains how it's supposed to work in the future. I don't think we need to solicit feedback, since we know how it's supposed to work on top of persisted metadata.

@aknuds1 aknuds1 requested a review from ywwg December 3, 2025 08:13
@aknuds1
Copy link
Contributor Author

aknuds1 commented Dec 3, 2025

Can we make "Giving Feedback" a heading so it's more eyecatching?

@ywwg @beorn7 I moved requests for feedback to a single section "Giving Feedback", just before the "Conclusion" section. WDYT?

@beorn7
Copy link
Member

beorn7 commented Dec 6, 2025

I'm traveling until Dec 16. My remaining items were not very significant, and I trust you to address them appropriately, so as soon as the other reviewers are happy, assume I am, too, and don't wait for my explicit approval.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants