Skip to content

Commit 462c169

Browse files
committed
Record EPP NormalizedTimePerOutputToken metric on streaming mode
Signed-off-by: Dharaneeshwaran Ravichandran <[email protected]>
1 parent 9286c12 commit 462c169

File tree

3 files changed

+3
-3
lines changed

3 files changed

+3
-3
lines changed

pkg/epp/handlers/server.go

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -276,6 +276,7 @@ func (s *StreamingServer) Process(srv extProcPb.ExternalProcessor_ProcessServer)
276276
reqCtx.ResponseCompleteTimestamp = time.Now()
277277
metrics.RecordRequestLatencies(ctx, reqCtx.IncomingModelName, reqCtx.TargetModelName, reqCtx.RequestReceivedTimestamp, reqCtx.ResponseCompleteTimestamp)
278278
metrics.RecordResponseSizes(reqCtx.IncomingModelName, reqCtx.TargetModelName, reqCtx.ResponseSize)
279+
metrics.RecordNormalizedTimePerOutputToken(ctx, reqCtx.IncomingModelName, reqCtx.TargetModelName, reqCtx.RequestReceivedTimestamp, reqCtx.ResponseCompleteTimestamp, reqCtx.Usage.CompletionTokens)
279280
}
280281

281282
reqCtx.respBodyResp = generateResponseBodyResponses(v.ResponseBody.Body, v.ResponseBody.EndOfStream)

site-src/guides/metrics-and-observability.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,7 @@ This guide describes the current state of exposed metrics and how to scrape them
3535
| inference_objective_request_total | Counter | The counter of requests broken out for each model. | `model_name`=&lt;model-name&gt; <br> `target_model_name`=&lt;target-model-name&gt; | ALPHA |
3636
| inference_objective_request_error_total | Counter | The counter of requests errors broken out for each model. | `model_name`=&lt;model-name&gt; <br> `target_model_name`=&lt;target-model-name&gt; | ALPHA |
3737
| inference_objective_request_duration_seconds | Distribution | Distribution of response latency. | `model_name`=&lt;model-name&gt; <br> `target_model_name`=&lt;target-model-name&gt; | ALPHA |
38-
| normalized_time_per_output_token_seconds | Distribution | Distribution of ntpot (response latency per output token) | `model_name`=&lt;model-name&gt; <br> `target_model_name`=&lt;target-model-name&gt; | ALPHA |
38+
| inference_objective_normalized_time_per_output_token_seconds | Distribution | Distribution of ntpot (response latency per output token) | `model_name`=&lt;model-name&gt; <br> `target_model_name`=&lt;target-model-name&gt; | ALPHA |
3939
| inference_objective_request_sizes | Distribution | Distribution of request size in bytes. | `model_name`=&lt;model-name&gt; <br> `target_model_name`=&lt;target-model-name&gt; | ALPHA |
4040
| inference_objective_response_sizes | Distribution | Distribution of response size in bytes. | `model_name`=&lt;model-name&gt; <br> `target_model_name`=&lt;target-model-name&gt; | ALPHA |
4141
| inference_objective_input_tokens | Distribution | Distribution of input token count. | `model_name`=&lt;model-name&gt; <br> `target_model_name`=&lt;target-model-name&gt; | ALPHA |

test/e2e/epp/e2e_test.go

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -244,8 +244,7 @@ func verifyMetrics() {
244244
"inference_objective_request_total",
245245
"inference_objective_request_error_total",
246246
"inference_objective_request_duration_seconds",
247-
// TODO: normalized_time_per_output_token_seconds is not actually recorded yet
248-
// "normalized_time_per_output_token_seconds",
247+
"inference_objective_normalized_time_per_output_token_seconds",
249248
"inference_objective_request_sizes",
250249
"inference_objective_response_sizes",
251250
"inference_objective_input_tokens",

0 commit comments

Comments
 (0)