-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Add PromQL info function blog post
#2777
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 3 commits
b49a0e1
102f34d
933a579
e0a6e06
3fb7592
b7f466e
6bf6f39
8f33dea
220ade1
729a755
4358dfc
2f68dcb
c6df22a
39a1864
151a16e
d8c4afb
61033ae
a083bca
aa82bae
f8b572d
893356a
6b2367c
61621b6
60d5707
386253e
018760b
823a03d
41da4ff
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,298 @@ | ||||||||||||||||||||
| --- | ||||||||||||||||||||
| title: Introducing the Experimental info() Function | ||||||||||||||||||||
| created_at: 2025-11-14 | ||||||||||||||||||||
| kind: article | ||||||||||||||||||||
| author_name: Arve Knudsen | ||||||||||||||||||||
| --- | ||||||||||||||||||||
|
|
||||||||||||||||||||
| Enriching metrics with metadata labels can be surprisingly tricky in Prometheus, even if you're a PromQL wiz! | ||||||||||||||||||||
| Traditionally, complex PromQL join syntax is required in Prometheus to add even basic information like Kubernetes cluster names or cloud provider regions to queries. | ||||||||||||||||||||
|
||||||||||||||||||||
| Traditionally, complex PromQL join syntax is required in Prometheus to add even basic information like Kubernetes cluster names or cloud provider regions to queries. | |
| The PromQL join query traditionally used for this is inherently quite complex because it has to specify the labels to join on, the info metric to join with, and the labels to enrich with. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PTAL.
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, I don't think the fundamental nature of the problem is syntactical.
info is simpler because you do not have to specify the identifying labels, you do not have to specify the info metric to join with, and if you just want to enrich with all data labels, you do not even have to specify the data labels you want to enrich with.
I think each of the three features is more important (both in practical terms and as a key insight how info works and why it is useful) than the next one (the "churn problem"), among others because the churn problem only occurs with broken staleness handling (which could be fixed in OTLP ingestion), and it should actually never occur if OTel folks just used resource attributes as they are meant to be used. (Of course, we don't need to explain those subtleties here, I just want to put things into perspective.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Revised, PTAL.
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should expand a section laying out what "identifying" really means, compared to "data labels" or "nonidentifying labels", referencing the otel data model (which also uses the word "descriptive"!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you want to write a section and make a PR to my branch or something? Maybe the easiest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, can do
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that in this situation, also the instance label will change (and has to change according to OTel conventions), so this problem shouldn't occur.
Maybe we can still keep using this example, but we should then also mention that this is a broken setup (where instance doesn't change upon pod recreation).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Revised, PTAL.
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe don't phrase this as an "if". It will never be marked properly as stale because of the way OTLP ingestion works. So on the one hand, we can clearly say that the staleness handling doesn't work right now, but we could also mention that this is an issue that could be fixed in the OTLP ingestion layer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Revised, PTAL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something is missing here. A headline like "The Solution in its simplest form" or something?
Currently, this appears as just another paragraph in the section describing the 2nd part of the problem…
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right. On reviewing the section, I realized that a sub-section was missing. I added it, PTAL.
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if we should better start with the easiest use case first, as this is about showcasing the simplicity. So let's leave out this line (it is also the one "weird" part of the info function, a new type of function parameter, using selector syntax without being a real selector).
It will promote all data labels, but that's not a problem because we aggregate them away in the sum anyway. (You can explain that later.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PTAL.
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| sum by (k8s_cluster_name, http_status_code) ( | |
| info( | |
| rate(http_server_request_duration_seconds_count[2m]), | |
| {k8s_cluster_name=~".+"} | |
| ) | |
| ) | |
| sum by (k8s_cluster_name, http_status_code) ( | |
| info(rate(http_server_request_duration_seconds_count[2m])) | |
| ) |
As per my other comment.
Maybe then follow up with the detail that this uses all data labels, then explain that it doesn't matter because we are aggregating them away in the sum, and then introduce the selector to select just one label, in cases where that is needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Revised, PTAL.
aknuds1 marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know Patrick ran into issues where info() had to be put in very specific places, especially when a rate function was involved. I think his intuition was to "wrap the metric name in info()" but that doesn't work when there's a rate function? So I think we should have an aside with some advice on how to know where the info function will need to go.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree in general, but the rationale is not that info is special or needlessly complicated here, but the opposite: info is just a normal function. Nothing new here. You place it where you place any other function that receives an instant vector. (You could call that "very specific places". I would call it "just the normal expected places".) For example, label_replace is placed at exactly the same places where you would use info. (Buth functions not only take an instant vector as an argument, they also share the property that they just manipulate labels, not values. All perfectly normal and expected – I might even use the infamous word "intuitive" for that.)
Patrick's issue has to do with PromQL in general, not info in particular. Obviously, the intuition of somebody not familiar with PromQL is very different from the intuition of somebody who knows PromQL. Catering for the one might very well be counterproductive for the other.
So we should argue here that info is just a normal function acting on any instant vector, not any new concept like a "decorator" or something.
beorn7 marked this conversation as resolved.
Show resolved
Hide resolved
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As said, I would frame this is the primary mode of using info. See above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Revised, PTAL.
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kube_pod_labels doesn't work because it will have the job and instance labels from KSM (which we would need to ignore in the join) and requires joining on namespace and pod, see comment below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed it, PTAL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe also note that the future intention is that PromQL "knows" what metrics in the TSDB are info metrics and automatically uses all of them, unless the selection is explicitly restricted by a name matcher like the above.
General question (not part of the review): Why didn't we make it possible to just write info(up, build_info) or info(up, build_info{version=~".+"})?
Maybe info(up, {"build_info"}) or info(up, {"build_info", version=~".+"}) works already?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Revised, PTAL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why didn't we make it possible to just write info(up, build_info) or info(up, build_info{version=~".+"})?
I don't remember it being considered :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I vaguely remember that there was a reason, but I don't remember the reason itself. 😬
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could be that it's difficult to support the syntax technically. Maybe with the current state of the implementation, these ideas could be revisited?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I would love that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"current implementation" implies that this may change, do we have any plans to? Maybe we can call out that we are looking for feedback on this point
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"current implementation" implies that this may change, do we have any plans to?
Yes the plan is relatively clear, the last sentence of the same paragraph already explains how it's supposed to work in the future. I don't think we need to solicit feedback, since we know how it's supposed to work on top of persisted metadata.
beorn7 marked this conversation as resolved.
Show resolved
Hide resolved
beorn7 marked this conversation as resolved.
Show resolved
Hide resolved
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is this different from the above?
Maybe we can add an example here that is not tailored to OTel/target_info? Maybe something using build_info.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Revised, PTAL.
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should include an example where you want to use a lot of resource attributes (because that's very long in the join query, but you just use the default behavior of info). Just promoting a single RA is kind-of the worst case for info and the best case for the traditional join query, so we shouldn't list it is the 1st example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I turned example 1 into such an example, PTAL.
aknuds1 marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would also add an Example 3 with filtering for label values. That's also possible with the join query, but makes things even more verbose, while info does it in a very natural way. Another strength of info we can and should showcase here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added this as example 2 (dropped the original one), PTAL.
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not just syntax, as said. info is utilizing information that has otherwise to be provided by the user. Sorry for hammering this in so much, but I think this is very crucial to understand why info is so cool (and also what is still missing in its implementation).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Revised, PTAL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know some people were suspicious about the performance of the target_info concept ... Do we have numbers comparing the performance of label promotion vs info() joining? If it's small enough, that may help discourage people from promoting everything by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know about perf comparisons versus label promotion. I just know there's an obvious performance benefit over join queries, due to being able to filter what to select on the RHS :)
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if there is value to repeat examples already given above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed them, PTAL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a link to directly file an issue? "If you experience any issues with info() please report them here" etc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, PTAL.
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this over-promises. There are common use cases already where this doesn't work. You even dropped kube_pod_labels above. We should clearly say that this is a real problem but also that we want to solve this in the future (by storing the information which labels are identifying).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Revised, PTAL.
aknuds1 marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not so much dynamic, but it's about (statically) storing the information what labels are identifying.
Note for future implementations (not related to the review, just happened to come to my mind while doing this review): For something like kube_pod_labels, we'll have the actual identifying labels (namespace, pod), then we'll have the data labels (in this case the actual pod labels), but when these metrics are ingested from KSM, we will also get a job and instance label attached (and whatever other target labels are configured). The future perfect version of info probably should just join on namespace and pod and only add the pod labels, but completely ignore job and instance and possibly other target labels (i.e. don't join on them, but also don't add them to the result either). Devil is in the detail here…
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Revised, PTAL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
link again
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, PTAL.
Uh oh!
There was an error while loading. Please reload this page.