Skip to content

Commit d402455

Browse files
github-actions[bot]CopilotneSpecc
authored
feat: add GraphQL and MongoDB Prometheus metrics (#544) (#547)
* Initial plan * feat: add GraphQL and MongoDB metrics * Bump version up to 1.1.43 * fix: correctly extract collection name from MongoDB command events * fix: add proper type annotations to GraphQL metrics plugin * fix: add null check for event.command to prevent undefined access * Update mongodb.ts * lint * reduce cardinality for mongo metrics * decrease buckets number, handle getMore case * Update package.json * Delete metrics.test.ts --------- Co-authored-by: Copilot <[email protected]> Co-authored-by: neSpecc <[email protected]> Co-authored-by: github-actions <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Peter Savchenko <[email protected]>
1 parent d4b8f82 commit d402455

File tree

8 files changed

+398
-48
lines changed

8 files changed

+398
-48
lines changed

docs/METRICS.md

Lines changed: 87 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -55,7 +55,7 @@ Duration of HTTP requests in seconds, labeled by:
5555
- `route` - Request route/path
5656
- `status_code` - HTTP status code
5757

58-
Buckets: 0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1, 5, 10 seconds
58+
Buckets: 0.01, 0.05, 0.1, 0.5, 1, 5, 10 seconds
5959

6060
#### http_requests_total (Counter)
6161

@@ -64,6 +64,77 @@ Total number of HTTP requests, labeled by:
6464
- `route` - Request route/path
6565
- `status_code` - HTTP status code
6666

67+
### GraphQL Metrics
68+
69+
#### hawk_gql_operation_duration_seconds (Histogram)
70+
71+
Histogram of total GraphQL operation duration by operation name and type.
72+
73+
Labels:
74+
- `operation_name` - Name of the GraphQL operation
75+
- `operation_type` - Type of operation (query, mutation, subscription)
76+
77+
Buckets: 0.01, 0.05, 0.1, 0.5, 1, 5, 10 seconds
78+
79+
**Purpose**: Identify slow API operations (P95/P99 latency).
80+
81+
#### hawk_gql_operation_errors_total (Counter)
82+
83+
Counter of failed GraphQL operations grouped by operation name and error class.
84+
85+
Labels:
86+
- `operation_name` - Name of the GraphQL operation
87+
- `error_type` - Type/class of the error
88+
89+
**Purpose**: Detect increased error rates and failing operations.
90+
91+
#### hawk_gql_resolver_duration_seconds (Histogram)
92+
93+
Histogram of resolver execution time per type, field, and operation.
94+
95+
Labels:
96+
- `type_name` - GraphQL type name
97+
- `field_name` - Field name being resolved
98+
- `operation_name` - Name of the GraphQL operation
99+
100+
Buckets: 0.01, 0.05, 0.1, 0.5, 1, 5 seconds
101+
102+
**Purpose**: Find slow or CPU-intensive resolvers that degrade overall performance.
103+
104+
### MongoDB Metrics
105+
106+
#### hawk_mongo_command_duration_seconds (Histogram)
107+
108+
Histogram of MongoDB command duration by command, collection family, and database.
109+
110+
Labels:
111+
- `command` - MongoDB command name (find, insert, update, etc.)
112+
- `collection_family` - Collection family name (extracted from dynamic collection names to reduce cardinality)
113+
- `db` - Database name
114+
115+
Buckets: 0.01, 0.05, 0.1, 0.5, 1, 5, 10 seconds
116+
117+
**Purpose**: Detect slow queries and high-latency collections.
118+
119+
**Note on Collection Families**: To reduce metric cardinality, dynamic collection names are grouped into families. For example:
120+
- `events:projectId``events`
121+
- `dailyEvents:projectId``dailyEvents`
122+
- `repetitions:projectId``repetitions`
123+
- `membership:userId``membership`
124+
- `team:workspaceId``team`
125+
126+
This prevents metric explosion when dealing with thousands of projects, users, or workspaces, while still providing meaningful insights into collection performance patterns.
127+
128+
#### hawk_mongo_command_errors_total (Counter)
129+
130+
Counter of failed MongoDB commands grouped by command and error code.
131+
132+
Labels:
133+
- `command` - MongoDB command name
134+
- `error_code` - MongoDB error code
135+
136+
**Purpose**: Track transient or persistent database errors.
137+
67138
## Testing
68139

69140
### Manual Testing
@@ -98,11 +169,25 @@ The metrics implementation uses the `prom-client` library and consists of:
98169
- Initializes a Prometheus registry
99170
- Configures default Node.js metrics collection
100171
- Defines custom HTTP metrics (duration histogram and request counter)
172+
- Registers GraphQL and MongoDB metrics
101173
- Provides middleware for tracking HTTP requests
102174
- Creates a separate Express app for serving metrics
103175

104-
2. **Integration** (`src/index.ts`):
176+
2. **GraphQL Metrics** (`src/metrics/graphql.ts`):
177+
- Implements Apollo Server plugin for tracking GraphQL operations
178+
- Tracks operation duration, errors, and resolver execution time
179+
- Automatically captures operation name, type, and field information
180+
181+
3. **MongoDB Metrics** (`src/metrics/mongodb.ts`):
182+
- Implements MongoDB command monitoring
183+
- Tracks command duration and errors
184+
- Uses MongoDB's command monitoring events
185+
- Extracts collection families from dynamic collection names to reduce cardinality
186+
187+
4. **Integration** (`src/index.ts`, `src/mongo.ts`):
188+
- Adds GraphQL metrics plugin to Apollo Server
105189
- Adds metrics middleware to the main Express app
190+
- Enables MongoDB command monitoring on database clients
106191
- Starts metrics server on a separate port
107192
- Keeps metrics server isolated from main API traffic
108193

package.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
{
22
"name": "hawk.api",
3-
"version": "1.1.42",
3+
"version": "1.2.0",
44
"main": "index.ts",
55
"license": "BUSL-1.1",
66
"scripts": {

src/index.ts

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ import BusinessOperationsFactory from './models/businessOperationsFactory';
2727
import schema from './schema';
2828
import { graphqlUploadExpress } from 'graphql-upload';
2929
import morgan from 'morgan';
30-
import { metricsMiddleware, createMetricsServer } from './metrics';
30+
import { metricsMiddleware, createMetricsServer, graphqlMetricsPlugin } from './metrics';
3131

3232
/**
3333
* Option to enable playground
@@ -122,6 +122,7 @@ class HawkAPI {
122122
process.env.NODE_ENV === 'production'
123123
? ApolloServerPluginLandingPageDisabled()
124124
: ApolloServerPluginLandingPageGraphQLPlayground(),
125+
graphqlMetricsPlugin,
125126
],
126127
context: ({ req }): ResolverContextBase => req.context,
127128
formatError: (error): GraphQLError => {

src/metrics/graphql.ts

Lines changed: 93 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,93 @@
1+
import client from 'prom-client';
2+
import { ApolloServerPlugin, GraphQLRequestContext, GraphQLRequestListener } from 'apollo-server-plugin-base';
3+
import { GraphQLError } from 'graphql';
4+
5+
/**
6+
* GraphQL operation duration histogram
7+
* Tracks GraphQL operation duration by operation name and type
8+
*/
9+
export const gqlOperationDuration = new client.Histogram({
10+
name: 'hawk_gql_operation_duration_seconds',
11+
help: 'Histogram of total GraphQL operation duration by operation name and type',
12+
labelNames: ['operation_name', 'operation_type'],
13+
buckets: [0.01, 0.05, 0.1, 0.5, 1, 5, 10],
14+
});
15+
16+
/**
17+
* GraphQL operation errors counter
18+
* Tracks failed GraphQL operations grouped by operation name and error class
19+
*/
20+
export const gqlOperationErrors = new client.Counter({
21+
name: 'hawk_gql_operation_errors_total',
22+
help: 'Counter of failed GraphQL operations grouped by operation name and error class',
23+
labelNames: ['operation_name', 'error_type'],
24+
});
25+
26+
/**
27+
* GraphQL resolver duration histogram
28+
* Tracks resolver execution time per type, field, and operation
29+
*/
30+
export const gqlResolverDuration = new client.Histogram({
31+
name: 'hawk_gql_resolver_duration_seconds',
32+
help: 'Histogram of resolver execution time per type, field, and operation',
33+
labelNames: ['type_name', 'field_name', 'operation_name'],
34+
buckets: [0.01, 0.05, 0.1, 0.5, 1, 5],
35+
});
36+
37+
/**
38+
* Apollo Server plugin to track GraphQL metrics
39+
*/
40+
export const graphqlMetricsPlugin: ApolloServerPlugin = {
41+
async requestDidStart(_requestContext: GraphQLRequestContext): Promise<GraphQLRequestListener> {
42+
const startTime = Date.now();
43+
let operationName = 'unknown';
44+
let operationType = 'unknown';
45+
46+
return {
47+
async didResolveOperation(ctx: GraphQLRequestContext): Promise<void> {
48+
operationName = ctx.operationName || 'anonymous';
49+
operationType = ctx.operation?.operation || 'unknown';
50+
},
51+
52+
async executionDidStart(): Promise<GraphQLRequestListener> {
53+
return {
54+
// eslint-disable-next-line @typescript-eslint/no-explicit-any
55+
willResolveField({ info }: any): () => void {
56+
const fieldStartTime = Date.now();
57+
58+
return (): void => {
59+
const duration = (Date.now() - fieldStartTime) / 1000;
60+
61+
gqlResolverDuration
62+
.labels(
63+
info.parentType.name,
64+
info.fieldName,
65+
operationName
66+
)
67+
.observe(duration);
68+
};
69+
},
70+
};
71+
},
72+
73+
async willSendResponse(ctx: GraphQLRequestContext): Promise<void> {
74+
const duration = (Date.now() - startTime) / 1000;
75+
76+
gqlOperationDuration
77+
.labels(operationName, operationType)
78+
.observe(duration);
79+
80+
// Track errors if any
81+
if (ctx.errors && ctx.errors.length > 0) {
82+
ctx.errors.forEach((error: GraphQLError) => {
83+
const errorType = error.extensions?.code || error.name || 'unknown';
84+
85+
gqlOperationErrors
86+
.labels(operationName, errorType as string)
87+
.inc();
88+
});
89+
}
90+
},
91+
};
92+
},
93+
};

src/metrics/index.ts

Lines changed: 25 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,7 @@
11
import client from 'prom-client';
22
import express from 'express';
3+
import { gqlOperationDuration, gqlOperationErrors, gqlResolverDuration } from './graphql';
4+
import { mongoCommandDuration, mongoCommandErrors } from './mongodb';
35

46
/**
57
* Create a Registry to register the metrics
@@ -19,7 +21,7 @@ const httpRequestDuration = new client.Histogram({
1921
name: 'http_request_duration_seconds',
2022
help: 'Duration of HTTP requests in seconds',
2123
labelNames: ['method', 'route', 'status_code'],
22-
buckets: [0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1, 5, 10],
24+
buckets: [0.01, 0.05, 0.1, 0.5, 1, 5, 10],
2325
registers: [ register ],
2426
});
2527

@@ -34,8 +36,24 @@ const httpRequestCounter = new client.Counter({
3436
registers: [ register ],
3537
});
3638

39+
/**
40+
* Register GraphQL metrics
41+
*/
42+
register.registerMetric(gqlOperationDuration);
43+
register.registerMetric(gqlOperationErrors);
44+
register.registerMetric(gqlResolverDuration);
45+
46+
/**
47+
* Register MongoDB metrics
48+
*/
49+
register.registerMetric(mongoCommandDuration);
50+
register.registerMetric(mongoCommandErrors);
51+
3752
/**
3853
* Express middleware to track HTTP metrics
54+
* @param req - Express request object
55+
* @param res - Express response object
56+
* @param next - Express next function
3957
*/
4058
export function metricsMiddleware(req: express.Request, res: express.Response, next: express.NextFunction): void {
4159
const start = Date.now();
@@ -71,3 +89,9 @@ export function createMetricsServer(): express.Application {
7189

7290
return metricsApp;
7391
}
92+
93+
/**
94+
* Export GraphQL metrics plugin and MongoDB metrics setup
95+
*/
96+
export { graphqlMetricsPlugin } from './graphql';
97+
export { setupMongoMetrics, withMongoMetrics } from './mongodb';

0 commit comments

Comments
 (0)