Skip to content

Commit 9933b7d

Browse files
authored
Adds metrics endpoint (#78)
* Adds metrics endpoint * Remove NewSchedulerMetricsHandler, not used * replace custom parser with official Prometheus libraries - Remove custom prometheus_metrics.go - Use expfmt.TextParser for parsing and expfmt.NewEncoder for output * acquire/release the loader's lock * I missed commiting this * remove unneeded dep * clean up
1 parent 6d3a7ac commit 9933b7d

File tree

10 files changed

+526
-38
lines changed

10 files changed

+526
-38
lines changed

METRICS.md

Lines changed: 84 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,84 @@
1+
# Aggregated Metrics Endpoint
2+
3+
The model-runner now exposes an aggregated `/metrics` endpoint that collects and labels metrics from all active llama.cpp runners.
4+
5+
## Overview
6+
7+
When llama.cpp models are running, each server automatically exposes Prometheus-compatible metrics at its `/metrics` endpoint. The model-runner now aggregates these metrics from all active runners, adds identifying labels, and serves them through a unified `/metrics` endpoint. This provides a comprehensive view of all running models with proper Prometheus labeling.
8+
9+
## Aggregated Metrics Format
10+
11+
Instead of exposing metrics from a single runner, the endpoint now aggregates metrics from all active runners and adds labels to identify the source:
12+
13+
### Example Output
14+
15+
```prometheus
16+
# HELP llama_prompt_tokens_total Total number of prompt tokens processed
17+
# TYPE llama_prompt_tokens_total counter
18+
llama_prompt_tokens_total{backend="llama.cpp",model="llama3.2:latest",mode="completion"} 4934
19+
llama_prompt_tokens_total{backend="llama.cpp",model="ai/mxbai-embed-large:335M-F16",mode="embedding"} 4525
20+
21+
# HELP llama_generation_tokens_total Total number of tokens generated
22+
# TYPE llama_generation_tokens_total counter
23+
llama_generation_tokens_total{backend="llama.cpp",model="llama3.2:latest",mode="completion"} 2156
24+
25+
# HELP llama_requests_total Total number of requests processed
26+
# TYPE llama_requests_total counter
27+
llama_requests_total{backend="llama.cpp",model="llama3.2:latest",mode="completion"} 127
28+
llama_requests_total{backend="llama.cpp",model="ai/mxbai-embed-large:335M-F16",mode="embedding"} 89
29+
```
30+
31+
### Labels Added
32+
33+
Each metric is automatically labeled with:
34+
- **`backend`**: The inference backend (e.g., "llama.cpp")
35+
- **`model`**: The model name (e.g., "llama3.2:latest")
36+
- **`mode`**: The operation mode ("completion" or "embedding")
37+
38+
## Usage
39+
40+
### Enabling Metrics (Default)
41+
42+
By default, the aggregated metrics endpoint is enabled. When the model-runner starts with active runners, you can access metrics at:
43+
44+
```
45+
GET /metrics
46+
```
47+
48+
### Disabling Metrics
49+
50+
To disable the metrics endpoint, set the `DISABLE_METRICS` environment variable:
51+
52+
```bash
53+
export DISABLE_METRICS=1
54+
```
55+
56+
### TCP Port Access
57+
58+
If you're running the model-runner with a TCP port (using `MODEL_RUNNER_PORT`), you can access metrics via HTTP:
59+
60+
```bash
61+
# If MODEL_RUNNER_PORT=8080
62+
curl http://localhost:8080/metrics
63+
```
64+
65+
### Unix Socket Access
66+
67+
If using Unix sockets (default), you'll need to use a tool that supports Unix socket HTTP requests:
68+
69+
```bash
70+
# Using curl with Unix socket
71+
curl --unix-socket model-runner.sock http://localhost/metrics
72+
```
73+
74+
## Metrics Available
75+
76+
The aggregated endpoint exposes all metrics from active llama.cpp runners, typically including:
77+
78+
- **Request metrics**: Total requests, request duration, queue statistics
79+
- **Token metrics**: Prompt tokens, generation tokens, tokens per second
80+
- **Memory metrics**: Memory usage, cache statistics
81+
- **Model metrics**: Model loading status, context usage
82+
- **Performance metrics**: Processing latency, throughput
83+
84+
All metrics retain their original names and types but gain the additional identifying labels.

README.md

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -95,6 +95,9 @@ curl http://localhost:8080/engines/llama.cpp/v1/chat/completions -X POST -d '{
9595

9696
# Delete a model
9797
curl http://localhost:8080/models/ai/smollm2 -X DELETE
98+
99+
# Get metrics
100+
curl http://localhost:8080/metrics
98101
```
99102

100103
The response will contain the model's reply:
@@ -122,3 +125,22 @@ The response will contain the model's reply:
122125
}
123126
}
124127
```
128+
129+
## Metrics
130+
131+
The Model Runner exposes [the metrics endpoint](https://github.com/ggml-org/llama.cpp/tree/master/tools/server#get-metrics-prometheus-compatible-metrics-exporter) of llama.cpp server at the `/metrics` endpoint. This allows you to monitor model performance, request statistics, and resource usage.
132+
133+
### Accessing Metrics
134+
135+
```sh
136+
# Get metrics in Prometheus format
137+
curl http://localhost:8080/metrics
138+
```
139+
140+
### Configuration
141+
142+
- **Enable metrics (default)**: Metrics are enabled by default
143+
- **Disable metrics**: Set `DISABLE_METRICS=1` environment variable
144+
- **Monitoring integration**: Add the endpoint to your Prometheus configuration
145+
146+
Check [METRICS.md](./METRICS.md) for more details.

go.mod

Lines changed: 17 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -8,11 +8,14 @@ require (
88
github.com/docker/model-distribution v0.0.0-20250512190053-b3792c042d57
99
github.com/google/go-containerregistry v0.20.3
1010
github.com/jaypipes/ghw v0.16.0
11+
github.com/mattn/go-shellwords v1.0.12
1112
github.com/opencontainers/go-digest v1.0.0
1213
github.com/opencontainers/image-spec v1.1.1
14+
github.com/prometheus/client_model v0.6.2
15+
github.com/prometheus/common v0.64.0
1316
github.com/sirupsen/logrus v1.9.3
1417
github.com/stretchr/testify v1.10.0
15-
golang.org/x/sync v0.12.0
18+
golang.org/x/sync v0.14.0
1619
)
1720

1821
require (
@@ -21,7 +24,7 @@ require (
2124
github.com/containerd/errdefs v1.0.0 // indirect
2225
github.com/containerd/log v0.1.0 // indirect
2326
github.com/containerd/stargz-snapshotter/estargz v0.16.3 // indirect
24-
github.com/davecgh/go-spew v1.1.1 // indirect
27+
github.com/davecgh/go-spew v1.1.2-0.20180830191138-d8f796af33cc // indirect
2528
github.com/distribution/reference v0.6.0 // indirect
2629
github.com/docker/cli v27.5.0+incompatible // indirect
2730
github.com/docker/distribution v2.8.3+incompatible // indirect
@@ -34,28 +37,31 @@ require (
3437
github.com/henvic/httpretty v0.1.4 // indirect
3538
github.com/jaypipes/pcidb v1.0.1 // indirect
3639
github.com/json-iterator/go v1.1.12 // indirect
37-
github.com/klauspost/compress v1.17.11 // indirect
38-
github.com/mattn/go-shellwords v1.0.12 // indirect
40+
github.com/klauspost/compress v1.18.0 // indirect
3941
github.com/mitchellh/go-homedir v1.1.0 // indirect
4042
github.com/moby/locker v1.0.1 // indirect
4143
github.com/modern-go/concurrent v0.0.0-20180306012644-bacd9c7ef1dd // indirect
4244
github.com/modern-go/reflect2 v1.0.2 // indirect
45+
github.com/munnerz/goautoneg v0.0.0-20191010083416-a7dc8b61c822 // indirect
4346
github.com/pkg/errors v0.9.1 // indirect
44-
github.com/pmezard/go-difflib v1.0.0 // indirect
47+
github.com/pmezard/go-difflib v1.0.1-0.20181226105442-5d4384ee4fb2 // indirect
4548
github.com/rs/dnscache v0.0.0-20230804202142-fc85eb664529 // indirect
4649
github.com/smallnest/ringbuffer v0.0.0-20241116012123-461381446e3d // indirect
4750
github.com/vbatts/tar-split v0.11.6 // indirect
4851
go.opentelemetry.io/auto/sdk v1.1.0 // indirect
49-
go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp v0.58.0 // indirect
52+
go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp v0.60.0 // indirect
5053
go.opentelemetry.io/otel v1.35.0 // indirect
5154
go.opentelemetry.io/otel/metric v1.35.0 // indirect
5255
go.opentelemetry.io/otel/trace v1.35.0 // indirect
53-
golang.org/x/crypto v0.35.0 // indirect
54-
golang.org/x/exp v0.0.0-20241108190413-2d47ceb2692f // indirect
55-
golang.org/x/mod v0.22.0 // indirect
56-
golang.org/x/sys v0.31.0 // indirect
57-
golang.org/x/tools v0.29.0 // indirect
56+
golang.org/x/crypto v0.37.0 // indirect
57+
golang.org/x/exp v0.0.0-20250106191152-7588d65b2ba8 // indirect
58+
golang.org/x/mod v0.24.0 // indirect
59+
golang.org/x/sys v0.33.0 // indirect
60+
golang.org/x/tools v0.32.0 // indirect
5861
gonum.org/v1/gonum v0.15.1 // indirect
62+
google.golang.org/genproto/googleapis/rpc v0.0.0-20250414145226-207652e42e2e // indirect
63+
google.golang.org/grpc v1.72.0 // indirect
64+
google.golang.org/protobuf v1.36.6 // indirect
5965
gopkg.in/yaml.v3 v3.0.1 // indirect
6066
howett.net/plist v1.0.0 // indirect
6167
)

go.sum

Lines changed: 40 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -27,8 +27,9 @@ github.com/containerd/stargz-snapshotter/estargz v0.16.3/go.mod h1:uyr4BfYfOj3G9
2727
github.com/containerd/typeurl/v2 v2.2.3 h1:yNA/94zxWdvYACdYO8zofhrTVuQY73fFU1y++dYSw40=
2828
github.com/containerd/typeurl/v2 v2.2.3/go.mod h1:95ljDnPfD3bAbDJRugOiShd/DlAAsxGtUBhJxIn7SCk=
2929
github.com/davecgh/go-spew v1.1.0/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38=
30-
github.com/davecgh/go-spew v1.1.1 h1:vj9j/u1bqnvCEfJOwUhtlOARqs3+rkHYY13jYWTU97c=
3130
github.com/davecgh/go-spew v1.1.1/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38=
31+
github.com/davecgh/go-spew v1.1.2-0.20180830191138-d8f796af33cc h1:U9qPSI2PIWSS1VwoXQT9A3Wy9MM3WgvqSxFWenqJduM=
32+
github.com/davecgh/go-spew v1.1.2-0.20180830191138-d8f796af33cc/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38=
3233
github.com/distribution/reference v0.6.0 h1:0IXCQ5g4/QMHHkarYzh5l+u8T3t73zM5QvfrDyIgxBk=
3334
github.com/distribution/reference v0.6.0/go.mod h1:BbU0aIcezP1/5jX/8MP0YiH4SdvB5Y4f/wlDRiLyi3E=
3435
github.com/docker/cli v27.5.0+incompatible h1:aMphQkcGtpHixwwhAXJT1rrK/detk2JIvDaFkLctbGM=
@@ -58,6 +59,8 @@ github.com/google/go-cmp v0.7.0/go.mod h1:pXiqmnSA92OHEEa9HXL2W4E7lf9JzCmGVUdgjX
5859
github.com/google/go-containerregistry v0.20.3 h1:oNx7IdTI936V8CQRveCjaxOiegWwvM7kqkbXTpyiovI=
5960
github.com/google/go-containerregistry v0.20.3/go.mod h1:w00pIgBRDVUDFM6bq+Qx8lwNWK+cxgCuX1vd3PIBDNI=
6061
github.com/google/gofuzz v1.0.0/go.mod h1:dBl0BpW6vV/+mYPU4Po3pmUjxk6FQPldtuIdl/M65Eg=
62+
github.com/google/uuid v1.6.0 h1:NIvaJDMOsjHA8n1jAhLSgzrAzy1Hgr+hNrb57e+94F0=
63+
github.com/google/uuid v1.6.0/go.mod h1:TIyPZe4MgqvfeYDBFedMoGGpEw/LqOeaOT+nhxU+yHo=
6164
github.com/gpustack/gguf-parser-go v0.14.1 h1:tmz2eTnSEFfE52V10FESqo9oAUquZ6JKQFntWC/wrEg=
6265
github.com/gpustack/gguf-parser-go v0.14.1/go.mod h1:GvHh1Kvvq5ojCOsJ5UpwiJJmIjFw3Qk5cW7R+CZ3IJo=
6366
github.com/henvic/httpretty v0.1.4 h1:Jo7uwIRWVFxkqOnErcoYfH90o3ddQyVrSANeS4cxYmU=
@@ -69,8 +72,8 @@ github.com/jaypipes/pcidb v1.0.1/go.mod h1:6xYUz/yYEyOkIkUt2t2J2folIuZ4Yg6uByCGF
6972
github.com/jessevdk/go-flags v1.4.0/go.mod h1:4FA24M0QyGHXBuZZK/XkWh8h0e1EYbRYJSGM75WSRxI=
7073
github.com/json-iterator/go v1.1.12 h1:PV8peI4a0ysnczrg+LtxykD8LfKY9ML6u2jnxaEnrnM=
7174
github.com/json-iterator/go v1.1.12/go.mod h1:e30LSqwooZae/UwlEbR2852Gd8hjQvJoHmT4TnhNGBo=
72-
github.com/klauspost/compress v1.17.11 h1:In6xLpyWOi1+C7tXUUWv2ot1QvBjxevKAaI6IXrJmUc=
73-
github.com/klauspost/compress v1.17.11/go.mod h1:pMDklpSncoRMuLFrf1W9Ss9KT+0rH90U12bZKk7uwG0=
75+
github.com/klauspost/compress v1.18.0 h1:c/Cqfb0r+Yi+JtIEq73FWXVkRonBlf0CRNYc8Zttxdo=
76+
github.com/klauspost/compress v1.18.0/go.mod h1:2Pp+KzxcywXVXMr50+X0Q/Lsb43OQHYWRCY2AiWywWQ=
7477
github.com/kr/pretty v0.3.1 h1:flRD4NNwYAUpkphVc1HcthR4KEIFJ65n8Mw5qdRn3LE=
7578
github.com/kr/pretty v0.3.1/go.mod h1:hoEshYVHaxMs3cyo3Yncou5ZscifuDolrwPKZanG3xk=
7679
github.com/kr/text v0.2.0 h1:5Nx0Ya0ZqY2ygV366QzturHI13Jq95ApcVaJBhpS+AY=
@@ -90,14 +93,21 @@ github.com/modern-go/concurrent v0.0.0-20180306012644-bacd9c7ef1dd h1:TRLaZ9cD/w
9093
github.com/modern-go/concurrent v0.0.0-20180306012644-bacd9c7ef1dd/go.mod h1:6dJC0mAP4ikYIbvyc7fijjWJddQyLn8Ig3JB5CqoB9Q=
9194
github.com/modern-go/reflect2 v1.0.2 h1:xBagoLtFs94CBntxluKeaWgTMpvLxC4ur3nMaC9Gz0M=
9295
github.com/modern-go/reflect2 v1.0.2/go.mod h1:yWuevngMOJpCy52FWWMvUC8ws7m/LJsjYzDa0/r8luk=
96+
github.com/munnerz/goautoneg v0.0.0-20191010083416-a7dc8b61c822 h1:C3w9PqII01/Oq1c1nUAm88MOHcQC9l5mIlSMApZMrHA=
97+
github.com/munnerz/goautoneg v0.0.0-20191010083416-a7dc8b61c822/go.mod h1:+n7T8mK8HuQTcFwEeznm/DIxMOiR9yIdICNftLE1DvQ=
9398
github.com/opencontainers/go-digest v1.0.0 h1:apOUWs51W5PlhuyGyz9FCeeBIOUDA/6nW8Oi/yOhh5U=
9499
github.com/opencontainers/go-digest v1.0.0/go.mod h1:0JzlMkj0TRzQZfJkVvzbP0HBR3IKzErnv2BNG4W4MAM=
95100
github.com/opencontainers/image-spec v1.1.1 h1:y0fUlFfIZhPF1W537XOLg0/fcx6zcHCJwooC2xJA040=
96101
github.com/opencontainers/image-spec v1.1.1/go.mod h1:qpqAh3Dmcf36wStyyWU+kCeDgrGnAve2nCC8+7h8Q0M=
97102
github.com/pkg/errors v0.9.1 h1:FEBLx1zS214owpjy7qsBeixbURkuhQAwrK5UwLGTwt4=
98103
github.com/pkg/errors v0.9.1/go.mod h1:bwawxfHBFNV+L2hUp1rHADufV3IMtnDRdf1r5NINEl0=
99-
github.com/pmezard/go-difflib v1.0.0 h1:4DBwDE0NGyQoBHbLQYPwSUPoCMWR5BEzIk/f1lZbAQM=
100104
github.com/pmezard/go-difflib v1.0.0/go.mod h1:iKH77koFhYxTK1pcRnkKkqfTogsbg7gZNVY4sRDYZ/4=
105+
github.com/pmezard/go-difflib v1.0.1-0.20181226105442-5d4384ee4fb2 h1:Jamvg5psRIccs7FGNTlIRMkT8wgtp5eCXdBlqhYGL6U=
106+
github.com/pmezard/go-difflib v1.0.1-0.20181226105442-5d4384ee4fb2/go.mod h1:iKH77koFhYxTK1pcRnkKkqfTogsbg7gZNVY4sRDYZ/4=
107+
github.com/prometheus/client_model v0.6.2 h1:oBsgwpGs7iVziMvrGhE53c/GrLUsZdHnqNwqPLxwZyk=
108+
github.com/prometheus/client_model v0.6.2/go.mod h1:y3m2F6Gdpfy6Ut/GBsUqTWZqCUvMVzSfMLjcu6wAwpE=
109+
github.com/prometheus/common v0.64.0 h1:pdZeA+g617P7oGv1CzdTzyeShxAGrTBsolKNOLQPGO4=
110+
github.com/prometheus/common v0.64.0/go.mod h1:0gZns+BLRQ3V6NdaerOhMbwwRbNh9hkGINtQAsP5GS8=
101111
github.com/rogpeppe/go-internal v1.13.1 h1:KvO1DLK/DRN07sQ1LQKScxyZJuNnedQ5/wKSR38lUII=
102112
github.com/rogpeppe/go-internal v1.13.1/go.mod h1:uMEvuHeurkdAXX61udpOXGD/AzZDWNMNyH2VO9fmH0o=
103113
github.com/rs/dnscache v0.0.0-20230804202142-fc85eb664529 h1:18kd+8ZUlt/ARXhljq+14TwAoKa61q6dX8jtwOf6DH8=
@@ -117,39 +127,43 @@ go.opencensus.io v0.24.0 h1:y73uSU6J157QMP2kn2r30vwW1A2W2WFwSCGnAVxeaD0=
117127
go.opencensus.io v0.24.0/go.mod h1:vNK8G9p7aAivkbmorf4v+7Hgx+Zs0yY+0fOtgBfjQKo=
118128
go.opentelemetry.io/auto/sdk v1.1.0 h1:cH53jehLUN6UFLY71z+NDOiNJqDdPRaXzTel0sJySYA=
119129
go.opentelemetry.io/auto/sdk v1.1.0/go.mod h1:3wSPjt5PWp2RhlCcmmOial7AvC4DQqZb7a7wCow3W8A=
120-
go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp v0.58.0 h1:yd02MEjBdJkG3uabWP9apV+OuWRIXGDuJEUJbOHmCFU=
121-
go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp v0.58.0/go.mod h1:umTcuxiv1n/s/S6/c2AT/g2CQ7u5C59sHDNmfSwgz7Q=
130+
go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp v0.60.0 h1:sbiXRNDSWJOTobXh5HyQKjq6wUC5tNybqjIqDpAY4CU=
131+
go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp v0.60.0/go.mod h1:69uWxva0WgAA/4bu2Yy70SLDBwZXuQ6PbBpbsa5iZrQ=
122132
go.opentelemetry.io/otel v1.35.0 h1:xKWKPxrxB6OtMCbmMY021CqC45J+3Onta9MqjhnusiQ=
123133
go.opentelemetry.io/otel v1.35.0/go.mod h1:UEqy8Zp11hpkUrL73gSlELM0DupHoiq72dR+Zqel/+Y=
124134
go.opentelemetry.io/otel/metric v1.35.0 h1:0znxYu2SNyuMSQT4Y9WDWej0VpcsxkuklLa4/siN90M=
125135
go.opentelemetry.io/otel/metric v1.35.0/go.mod h1:nKVFgxBZ2fReX6IlyW28MgZojkoAkJGaE8CpgeAU3oE=
136+
go.opentelemetry.io/otel/sdk v1.35.0 h1:iPctf8iprVySXSKJffSS79eOjl9pvxV9ZqOWT0QejKY=
137+
go.opentelemetry.io/otel/sdk v1.35.0/go.mod h1:+ga1bZliga3DxJ3CQGg3updiaAJoNECOgJREo9KHGQg=
138+
go.opentelemetry.io/otel/sdk/metric v1.35.0 h1:1RriWBmCKgkeHEhM7a2uMjMUfP7MsOF5JpUCaEqEI9o=
139+
go.opentelemetry.io/otel/sdk/metric v1.35.0/go.mod h1:is6XYCUMpcKi+ZsOvfluY5YstFnhW0BidkR+gL+qN+w=
126140
go.opentelemetry.io/otel/trace v1.35.0 h1:dPpEfJu1sDIqruz7BHFG3c7528f6ddfSWfFDVt/xgMs=
127141
go.opentelemetry.io/otel/trace v1.35.0/go.mod h1:WUk7DtFp1Aw2MkvqGdwiXYDZZNvA/1J8o6xRXLrIkyc=
128-
golang.org/x/crypto v0.35.0 h1:b15kiHdrGCHrP6LvwaQ3c03kgNhhiMgvlhxHQhmg2Xs=
129-
golang.org/x/crypto v0.35.0/go.mod h1:dy7dXNW32cAb/6/PRuTNsix8T+vJAqvuIy5Bli/x0YQ=
130-
golang.org/x/exp v0.0.0-20241108190413-2d47ceb2692f h1:XdNn9LlyWAhLVp6P/i8QYBW+hlyhrhei9uErw2B5GJo=
131-
golang.org/x/exp v0.0.0-20241108190413-2d47ceb2692f/go.mod h1:D5SMRVC3C2/4+F/DB1wZsLRnSNimn2Sp/NPsCrsv8ak=
132-
golang.org/x/mod v0.22.0 h1:D4nJWe9zXqHOmWqj4VMOJhvzj7bEZg4wEYa759z1pH4=
133-
golang.org/x/mod v0.22.0/go.mod h1:6SkKJ3Xj0I0BrPOZoBy3bdMptDDU9oJrpohJ3eWZ1fY=
142+
golang.org/x/crypto v0.37.0 h1:kJNSjF/Xp7kU0iB2Z+9viTPMW4EqqsrywMXLJOOsXSE=
143+
golang.org/x/crypto v0.37.0/go.mod h1:vg+k43peMZ0pUMhYmVAWysMK35e6ioLh3wB8ZCAfbVc=
144+
golang.org/x/exp v0.0.0-20250106191152-7588d65b2ba8 h1:yqrTHse8TCMW1M1ZCP+VAR/l0kKxwaAIqN/il7x4voA=
145+
golang.org/x/exp v0.0.0-20250106191152-7588d65b2ba8/go.mod h1:tujkw807nyEEAamNbDrEGzRav+ilXA7PCRAd6xsmwiU=
146+
golang.org/x/mod v0.24.0 h1:ZfthKaKaT4NrhGVZHO1/WDTwGES4De8KtWO0SIbNJMU=
147+
golang.org/x/mod v0.24.0/go.mod h1:IXM97Txy2VM4PJ3gI61r1YEk/gAj6zAHN3AdZt6S9Ww=
134148
golang.org/x/sync v0.0.0-20190423024810-112230192c58/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM=
135-
golang.org/x/sync v0.12.0 h1:MHc5BpPuC30uJk597Ri8TV3CNZcTLu6B6z4lJy+g6Jw=
136-
golang.org/x/sync v0.12.0/go.mod h1:1dzgHSNfp02xaA81J2MS99Qcpr2w7fw1gpm99rleRqA=
149+
golang.org/x/sync v0.14.0 h1:woo0S4Yywslg6hp4eUFjTVOyKt0RookbpAHG4c1HmhQ=
150+
golang.org/x/sync v0.14.0/go.mod h1:1dzgHSNfp02xaA81J2MS99Qcpr2w7fw1gpm99rleRqA=
137151
golang.org/x/sys v0.0.0-20190916202348-b4ddaad3f8a3/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs=
138152
golang.org/x/sys v0.0.0-20220715151400-c0bba94af5f8/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
139-
golang.org/x/sys v0.31.0 h1:ioabZlmFYtWhL+TRYpcnNlLwhyxaM9kWTDEmfnprqik=
140-
golang.org/x/sys v0.31.0/go.mod h1:BJP2sWEmIv4KK5OTEluFJCKSidICx8ciO85XgH3Ak8k=
141-
golang.org/x/term v0.29.0 h1:L6pJp37ocefwRRtYPKSWOWzOtWSxVajvz2ldH/xi3iU=
142-
golang.org/x/term v0.29.0/go.mod h1:6bl4lRlvVuDgSf3179VpIxBF0o10JUpXWOnI7nErv7s=
143-
golang.org/x/tools v0.29.0 h1:Xx0h3TtM9rzQpQuR4dKLrdglAmCEN5Oi+P74JdhdzXE=
144-
golang.org/x/tools v0.29.0/go.mod h1:KMQVMRsVxU6nHCFXrBPhDB8XncLNLM0lIy/F14RP588=
153+
golang.org/x/sys v0.33.0 h1:q3i8TbbEz+JRD9ywIRlyRAQbM0qF7hu24q3teo2hbuw=
154+
golang.org/x/sys v0.33.0/go.mod h1:BJP2sWEmIv4KK5OTEluFJCKSidICx8ciO85XgH3Ak8k=
155+
golang.org/x/term v0.31.0 h1:erwDkOK1Msy6offm1mOgvspSkslFnIGsFnxOKoufg3o=
156+
golang.org/x/term v0.31.0/go.mod h1:R4BeIy7D95HzImkxGkTW1UQTtP54tio2RyHz7PwK0aw=
157+
golang.org/x/tools v0.32.0 h1:Q7N1vhpkQv7ybVzLFtTjvQya2ewbwNDZzUgfXGqtMWU=
158+
golang.org/x/tools v0.32.0/go.mod h1:ZxrU41P/wAbZD8EDa6dDCa6XfpkhJ7HFMjHJXfBDu8s=
145159
gonum.org/v1/gonum v0.15.1 h1:FNy7N6OUZVUaWG9pTiD+jlhdQ3lMP+/LcTpJ6+a8sQ0=
146160
gonum.org/v1/gonum v0.15.1/go.mod h1:eZTZuRFrzu5pcyjN5wJhcIhnUdNijYxX1T2IcrOGY0o=
147-
google.golang.org/genproto/googleapis/rpc v0.0.0-20241021214115-324edc3d5d38 h1:zciRKQ4kBpFgpfC5QQCVtnnNAcLIqweL7plyZRQHVpI=
148-
google.golang.org/genproto/googleapis/rpc v0.0.0-20241021214115-324edc3d5d38/go.mod h1:GX3210XPVPUjJbTUbvwI8f2IpZDMZuPJWDzDuebbviI=
149-
google.golang.org/grpc v1.68.1 h1:oI5oTa11+ng8r8XMMN7jAOmWfPZWbYpCFaMUTACxkM0=
150-
google.golang.org/grpc v1.68.1/go.mod h1:+q1XYFJjShcqn0QZHvCyeR4CXPA+llXIeUIfIe00waw=
151-
google.golang.org/protobuf v1.36.3 h1:82DV7MYdb8anAVi3qge1wSnMDrnKK7ebr+I0hHRN1BU=
152-
google.golang.org/protobuf v1.36.3/go.mod h1:9fA7Ob0pmnwhb644+1+CVWFRbNajQ6iRojtC/QF5bRE=
161+
google.golang.org/genproto/googleapis/rpc v0.0.0-20250414145226-207652e42e2e h1:ztQaXfzEXTmCBvbtWYRhJxW+0iJcz2qXfd38/e9l7bA=
162+
google.golang.org/genproto/googleapis/rpc v0.0.0-20250414145226-207652e42e2e/go.mod h1:qQ0YXyHHx3XkvlzUtpXDkS29lDSafHMZBAZDc03LQ3A=
163+
google.golang.org/grpc v1.72.0 h1:S7UkcVa60b5AAQTaO6ZKamFp1zMZSU0fGDK2WZLbBnM=
164+
google.golang.org/grpc v1.72.0/go.mod h1:wH5Aktxcg25y1I3w7H69nHfXdOG3UiadoBtjh3izSDM=
165+
google.golang.org/protobuf v1.36.6 h1:z1NpPI8ku2WgiWnf+t9wTPsn6eP1L7ksHUlkfLvd9xY=
166+
google.golang.org/protobuf v1.36.6/go.mod h1:jduwjTPXsFjZGTmRluh+L6NjiWu7pchiJ2/5YcXBHnY=
153167
gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405/go.mod h1:Co6ibVJAznAaIkqp8huTwlJQCZ016jof/cbN4VW5Yz0=
154168
gopkg.in/check.v1 v1.0.0-20201130134442-10cb98267c6c h1:Hei/4ADfdWqJk1ZMxUNpqntNwaWcugrBjAiHlqqRiVk=
155169
gopkg.in/check.v1 v1.0.0-20201130134442-10cb98267c6c/go.mod h1:JHkPIbrfpd72SG/EVd6muEfDQjcINNoR0C8j2r3qZ4Q=

main.go

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -112,6 +112,18 @@ func main() {
112112
router.Handle(route, scheduler)
113113
}
114114

115+
// Add metrics endpoint if enabled
116+
if os.Getenv("DISABLE_METRICS") != "1" {
117+
metricsHandler := metrics.NewAggregatedMetricsHandler(
118+
log.WithField("component", "metrics"),
119+
scheduler,
120+
)
121+
router.Handle("/metrics", metricsHandler)
122+
log.Info("Metrics endpoint enabled at /metrics")
123+
} else {
124+
log.Info("Metrics endpoint disabled")
125+
}
126+
115127
server := &http.Server{Handler: router}
116128
serverErrors := make(chan error, 1)
117129

pkg/inference/backends/llamacpp/llamacpp_config.go

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ type Config struct {
1515

1616
// NewDefaultLlamaCppConfig creates a new LlamaCppConfig with default values.
1717
func NewDefaultLlamaCppConfig() *Config {
18-
args := append([]string{"--jinja", "-ngl", "100"})
18+
args := append([]string{"--jinja", "-ngl", "100", "--metrics"})
1919

2020
// Special case for Windows ARM64
2121
if runtime.GOOS == "windows" && runtime.GOARCH == "arm64" {

pkg/inference/backends/llamacpp/llamacpp_config_test.go

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -81,6 +81,7 @@ func TestGetArgs(t *testing.T) {
8181
expected: []string{
8282
"--jinja",
8383
"-ngl", "100",
84+
"--metrics",
8485
"--model", modelPath,
8586
"--host", socket,
8687
},
@@ -91,6 +92,7 @@ func TestGetArgs(t *testing.T) {
9192
expected: []string{
9293
"--jinja",
9394
"-ngl", "100",
95+
"--metrics",
9496
"--model", modelPath,
9597
"--host", socket,
9698
"--embeddings",

0 commit comments

Comments
 (0)