-
Notifications
You must be signed in to change notification settings - Fork 2
Description
Problem Statement
We need to monitor subscription lag in Grafana dashboards but cannot calculate lag from existing endpoints because:
/api/resources/v1/subscriptionsreturnsstatus.stream.ackedOffsetbut NOT partition length- Getting partition metadata requires calling separate EventStore/partition APIs
- Grafana cannot join data from multiple API calls or perform complex calculations
- This forces us to use external ETL processes, adding complexity and delay
We would appreciate having a single endpoint that returns subscription health with pre-calculated lag.
What We Need
A monitoring-optimized endpoint that returns subscription health metrics in a format Grafana can consume directly.
Proposed Endpoint
HTTP Request
GET /api/resources/v1/subscriptions/health
Query Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
namespace |
string | No | Filter subscriptions by namespace |
labelSelector |
string | No | Comma-separated label selectors (e.g., app=content,env=prod) |
Response Model
The endpoint MUST return a JSON array at the root level (not wrapped in an object) for Grafana compatibility.
[
{
"name": "string", // Subscription name (required)
"namespace": "string", // Subscription namespace (required)
"phase": "string", // active | inactive | failed (required)
"partitionId": "string", // Partition identifier (nullable)
"ackedOffset": 51523, // Last acknowledged offset (nullable, number)
"partitionLength": 51650, // Current partition length (nullable, number)
"lag": 127, // Calculated lag: partitionLength - ackedOffset (nullable, number)
"subscriberState": "string", // reachable | unreachable (nullable)
"subscriberReason": "string" // Error message if unreachable (nullable)
}
]Sample Response
Example 1: Multiple Healthy Subscriptions
[
{
"name": "content-subscription",
"namespace": "mozart",
"phase": "active",
"partitionId": "https://test1.com",
"ackedOffset": 51523,
"partitionLength": 51650,
"lag": 127,
"subscriberState": "reachable",
"subscriberReason": null
},
{
"name": "desktop-events-subscription",
"namespace": "mozart",
"phase": "active",
"partitionId": "https://test2.com",
"ackedOffset": 98234,
"partitionLength": 98650,
"lag": 416,
"subscriberState": "reachable",
"subscriberReason": null
}
]Example 2: Subscription with Problems
[
{
"name": "failing-subscription",
"namespace": "mozart",
"phase": "active",
"partitionId": "https://some-adapter.com",
"ackedOffset": 45000,
"partitionLength": 46523,
"lag": 1523,
"subscriberState": "unreachable",
"subscriberReason": "HTTP 503 Service Unavailable: Connection refused"
},
{
"name": "inactive-subscription",
"namespace": "mozart",
"phase": "inactive",
"partitionId": "https://disabled-adapter.com",
"ackedOffset": null,
"partitionLength": null,
"lag": null,
"subscriberState": null,
"subscriberReason": null
}
]Example 3: Filtered by Namespace
GET /api/resources/v1/subscriptions/health?namespace=mozart
Returns only subscriptions in "mozart" namespace.
Example 4: Filtered by Label
GET /api/resources/v1/subscriptions/health?labelSelector=app=content,critical=true
Returns only subscriptions matching both labels.
Example 5: No Matching Subscriptions
[]Empty array when no subscriptions match the filters.
Implementation Requirements
Critical Requirements for Grafana Integration
-
Root-Level Array: Response MUST be a JSON array at root level, NOT wrapped in an object:
// ✅ CORRECT [{"name": "sub1", ...}, {"name": "sub2", ...}] // ❌ WRONG - Grafana won't parse this {"items": [...], "count": 2}
-
Data Types:
lag,ackedOffset,partitionLengthMUST be numeric types (not strings)- Null values are acceptable for optional fields
-
Consistent Fields:
- Phase values:
active,inactive,failed - SubscriberState values:
reachable,unreachable,null - All fields present in every response (use null for missing data)
- Phase values:
-
Error Handling:
- If partition metadata unavailable: include subscription with
partitionLength=null,lag=null - Never filter out subscriptions due to missing data
- If partition metadata unavailable: include subscription with
Grafana Use Cases
What we need to build:
- Table showing subscription lag with color thresholds (red if lag > 1000)
- Time series graph of lag over time (one line per subscription)
- Alerts: trigger when lag > threshold for 5 minutes
- Status panels: count of active/inactive/failed subscriptions
Critical for Grafana:
- Root-level JSON array (not wrapped in object)
- Numeric types for
lag,ackedOffset,partitionLength(not strings) - Consistent field names across all responses
- Empty array
[]when no results (not null or missing)
Why Not Use Existing /api/resources/v1/subscriptions Endpoint?
The existing endpoint doesn't have partition length data:
// Current /subscriptions response
{
"items": [{
"status": {
"stream": {
"ackedOffset": 51523 // ✅ We have this
}
}
}]
}
// ❌ No partition length to calculate lag!To calculate lag we would need:
- Call
/api/resources/v1/subscriptionsto getackedOffset - Call another endpoint (partition metadata API) to get
partitionLength - Use Grafana transformations to join the data and calculate
lag = partitionLength - ackedOffset
This is impractical because:
- Grafana's transformation capabilities are limited
- Cannot reliably join data from multiple datasources
- Performance issues from multiple API calls every 15-30 seconds
- Complex transformations make dashboards fragile and unmaintainable
The health endpoint solves this by:
- Fetching both
ackedOffset(from subscription) andpartitionLength(from event store) server-side - Pre-calculating lag
- Returning flat, monitoring-optimized structure
- Single API call with <500ms response time