@@ -167,6 +167,139 @@ The server meets the following performance requirements:
167167- ** Mock Operations** : < 500ms per operation
168168- ** Memory Usage** : < 50MB
169169
170+ ## 📈 Prometheus Metrics Endpoint
171+
172+ The server exposes a ` /metrics ` endpoint in Prometheus text format for integration with monitoring stacks like Prometheus, Grafana, and Datadog.
173+
174+ ### Enabling Metrics
175+
176+ The metrics endpoint is enabled by default when the health check server is running. To configure:
177+
178+ ``` bash
179+ # Enable health check server (required for /metrics)
180+ HEALTH_CHECK_ENABLED=true
181+
182+ # Optionally set a custom port (default: 8080)
183+ HEALTH_CHECK_PORT=8080
184+
185+ # Disable Prometheus metrics (metrics enabled by default)
186+ PROMETHEUS_METRICS_ENABLED=false
187+ ```
188+
189+ ### Available Metrics
190+
191+ #### Authentication Metrics
192+
193+ | Metric | Type | Description |
194+ | ---------------------------------- | --------- | -------------------------------------------------------------------- |
195+ | ` lighthouse_auth_total{status} ` | Counter | Total authentication attempts by status (success, failure, fallback) |
196+ | ` lighthouse_auth_duration_seconds ` | Histogram | Authentication duration distribution |
197+ | ` lighthouse_unique_api_keys ` | Gauge | Number of unique API keys seen |
198+
199+ #### Cache Metrics
200+
201+ | Metric | Type | Description |
202+ | ------------------------------- | ------- | ---------------------------- |
203+ | ` lighthouse_cache_hits_total ` | Counter | Total cache hits |
204+ | ` lighthouse_cache_misses_total ` | Counter | Total cache misses |
205+ | ` lighthouse_cache_size ` | Gauge | Current cache size (entries) |
206+ | ` lighthouse_cache_max_size ` | Gauge | Maximum cache capacity |
207+
208+ #### Tool Metrics
209+
210+ | Metric | Type | Description |
211+ | ------------------------------------------------ | --------- | ----------------------------------- |
212+ | ` lighthouse_tool_calls_total{tool} ` | Counter | Total tool invocations by tool name |
213+ | ` lighthouse_tools_registered ` | Gauge | Number of registered tools |
214+ | ` lighthouse_request_duration_seconds{operation} ` | Histogram | Request duration by operation |
215+
216+ #### Security Metrics
217+
218+ | Metric | Type | Description |
219+ | ---------------------------------------- | ------- | --------------------------------------------------------------------------- |
220+ | ` lighthouse_security_events_total{type} ` | Counter | Security events by type (AUTHENTICATION_FAILURE, RATE_LIMIT_EXCEEDED, etc.) |
221+
222+ #### Storage Metrics
223+
224+ | Metric | Type | Description |
225+ | -------------------------------- | ----- | ------------------------------- |
226+ | ` lighthouse_storage_files ` | Gauge | Number of files in storage |
227+ | ` lighthouse_storage_bytes ` | Gauge | Total storage usage in bytes |
228+ | ` lighthouse_storage_max_bytes ` | Gauge | Maximum storage capacity |
229+ | ` lighthouse_storage_utilization ` | Gauge | Storage utilization ratio (0-1) |
230+
231+ #### Service Pool Metrics
232+
233+ | Metric | Type | Description |
234+ | ---------------------------------- | ----- | ----------------------------- |
235+ | ` lighthouse_service_pool_size ` | Gauge | Current service pool size |
236+ | ` lighthouse_service_pool_max_size ` | Gauge | Maximum service pool capacity |
237+
238+ #### Process Metrics (Auto-collected)
239+
240+ | Metric | Type | Description |
241+ | ------------------------------------------ | ------- | ----------------------- |
242+ | ` lighthouse_process_cpu_seconds_total ` | Counter | Total CPU time consumed |
243+ | ` lighthouse_process_resident_memory_bytes ` | Gauge | Resident memory size |
244+ | ` lighthouse_nodejs_eventloop_lag_seconds ` | Gauge | Node.js event loop lag |
245+ | ` lighthouse_nodejs_heap_size_total_bytes ` | Gauge | Total heap size |
246+ | ` lighthouse_nodejs_heap_size_used_bytes ` | Gauge | Used heap size |
247+
248+ ### Example Output
249+
250+ ``` prometheus
251+ # HELP lighthouse_auth_total Total authentication attempts
252+ # TYPE lighthouse_auth_total counter
253+ lighthouse_auth_total{status="success"} 1542
254+ lighthouse_auth_total{status="failure"} 23
255+ lighthouse_auth_total{status="fallback"} 156
256+
257+ # HELP lighthouse_cache_hits_total Total cache hits
258+ # TYPE lighthouse_cache_hits_total counter
259+ lighthouse_cache_hits_total 12453
260+
261+ # HELP lighthouse_cache_misses_total Total cache misses
262+ # TYPE lighthouse_cache_misses_total counter
263+ lighthouse_cache_misses_total 1847
264+
265+ # HELP lighthouse_request_duration_seconds Request duration in seconds
266+ # TYPE lighthouse_request_duration_seconds histogram
267+ lighthouse_request_duration_seconds_bucket{operation="lighthouse_upload_file",le="0.1"} 234
268+ lighthouse_request_duration_seconds_bucket{operation="lighthouse_upload_file",le="0.5"} 892
269+ lighthouse_request_duration_seconds_bucket{operation="lighthouse_upload_file",le="1"} 1023
270+ lighthouse_request_duration_seconds_bucket{operation="lighthouse_upload_file",le="+Inf"} 1024
271+ lighthouse_request_duration_seconds_sum{operation="lighthouse_upload_file"} 342.87
272+ lighthouse_request_duration_seconds_count{operation="lighthouse_upload_file"} 1024
273+
274+ # HELP lighthouse_security_events_total Total security events by type
275+ # TYPE lighthouse_security_events_total counter
276+ lighthouse_security_events_total{type="AUTHENTICATION_FAILURE"} 23
277+ lighthouse_security_events_total{type="RATE_LIMIT_EXCEEDED"} 5
278+ ```
279+
280+ ### Prometheus Configuration
281+
282+ Add this scrape configuration to your ` prometheus.yml ` :
283+
284+ ``` yaml
285+ scrape_configs :
286+ - job_name : " lighthouse-mcp-server"
287+ static_configs :
288+ - targets : ["localhost:8080"]
289+ metrics_path : /metrics
290+ scrape_interval : 15s
291+ ` ` `
292+
293+ ### Grafana Dashboard
294+
295+ Import the metrics into Grafana and create dashboards to visualize:
296+
297+ - Authentication success/failure rates
298+ - Cache hit rate over time
299+ - Tool usage patterns
300+ - Storage utilization trends
301+ - Security event alerts
302+
170303## 🧪 Testing
171304
172305` ` ` bash
0 commit comments