Skip to content

SD-11577: add support for kv metrics#10

Merged
rtalvarez merged 3 commits intomasterfrom
kv-metrics
Mar 18, 2026
Merged

SD-11577: add support for kv metrics#10
rtalvarez merged 3 commits intomasterfrom
kv-metrics

Conversation

@rtalvarez
Copy link
Member

@rtalvarez rtalvarez commented Mar 17, 2026

Pull Request Template

Description

Adds Cloudflare KV storage metrics to the Prometheus exporter. Two new metrics are exposed:

  • cloudflare_kv_requests_count — number of KV operations by namespace and action type (read/write/delete/list)
  • cloudflare_kv_latency — KV operation latency quantiles (P50/P75/P99/P999) in milliseconds

Namespace IDs are resolved to human-readable names via the Cloudflare REST API (/storage/kv/namespaces), falling back to the raw ID if the name cannot be resolved.

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Code refactoring
  • Other (please describe):

Testing

Ran the app locally and verified that cloudflare_kv_latency and cloudflare_kv_requests_count are visible by visiting /metrics

Verified the graphql query gets executed by the app

DEBU[2026-03-17 15:47:25] func:main.NewGraphQLClient.func1 file:graphql.go >> query:
        query ($accountID: String!, $mintime: Time!, $maxtime: Time!, $limit: Int!) {
                viewer {
                        accounts(filter: {accountTag: $accountID}) {
                                kvOperationsAdaptiveGroups(limit: $limit, filter: {datetime_geq: $mintime, datetime_lt: $maxtime}) {
                                        dimensions {
                                                namespaceId
                                                actionType
                                        }
                                        sum {
                                                requests
                                        }
                                        quantiles {
                                                latencyMsP50
                                                latencyMsP75
                                                latencyMsP99
                                                latencyMsP999
                                        }
                                }
                        }
                }
        }
  • I have run make pr-tests and all tests pass
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

Code Quality

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation

Before Submitting

Please ensure you have completed the following before submitting your PR:

# Run comprehensive tests
make pr-tests

If the above command fails, please fix the issues before submitting your PR.

Additional Notes

Add any other context about the pull request here.

@rtalvarez rtalvarez marked this pull request as ready for review March 17, 2026 20:34
prometheus.go Outdated
wg.Add(1)
defer wg.Done()

namespaceMap, err := fetchKVNamespaces(account.ID)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🍹 From experience, this list is going to get longer and longer the more native hosting stores we have. I think it might be fine for now, but for example, we do the auto paging for custom hostnames and it takes ~5 minutes to get through 90k custom hostnames.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm you're right. do you think it's needed to fetch KV namespaces on every scrape? i think it might be a bit overkill. maybe we can cache them on boot and then refresh the cache on a separate interval (different from the scrape interval)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Only add namespace_name if the id matches a hardcoded list - need mapping table.
  2. Only add namespace_id if the id matches a hardcoded list - simple slice.
  3. Add both namespace_name and namespace_id if id matches a hardcoded list – need mapping table.

All other namespaces that are not on the hardcoded list don't have the label because it's high-cardinality.

Copy link
Member Author

@rtalvarez rtalvarez Mar 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added changes to implement 2) in e3fb3b0

a quick caveat: prometheus constraint is that the same set of labels have to be set on every series, we can't omit the namespace_id entirely. so i've defaulted it to other so they all get bucketed there.

@rtalvarez rtalvarez merged commit a9d9406 into master Mar 18, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants