You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/user_guide/metrics.md
+19Lines changed: 19 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -100,6 +100,25 @@ Count*. The count metrics are illustrated by the following examples:
100
100
||Execution Count |`nv_inference_exec_count`|Number of inference batch executions (see [Inference Request Metrics](#inference-request-metrics), does not include cached requests)|Per model|Per request|
101
101
||Pending Request Count |`nv_inference_pending_request_count`|Number of inference requests awaiting execution by a backend. This number is incremented when a request is enqueued to the server (`TRITONSERVER_ServerInferAsync`) and is decremented when a backend is about to start executing the request. More details can be found below. |Per model|Per request|
102
102
103
+
#### Failure Count Categories
104
+
105
+
| Failed Request Reason |Description |
106
+
|------------|------------|
107
+
| REJECTED | Number of inference failures due to request timeout in the schedular. |
108
+
| CANCELED | Number of inference failures due to request cancellation in the core. |
109
+
| BACKEND | Number of inference failures during execution of requests in the backend/model. |
110
+
| OTHER | Number of inference failures due to other uncategorized reasons in the core. |
111
+
112
+
> **Note**
113
+
>
114
+
> Ensemble failure metrics will reflect the failure counts of their composing models as well as the parent model, but currently do not capture the same granularity for the "reason" label and will default to the "OTHER" reason.
115
+
>
116
+
> For example, if EnsembleA contains ModelA, and ModelA experiences a failed request due to a queue/backlog timeout in the scheduler, ModelA will have a failed request metric reflecting `reason=REJECTED` and `count=1`.
117
+
> Additionally, EnsembleA will have a failed request metric reflecting `reason=OTHER` and `count=2`.
118
+
> The `count=2` reflects 1 from the internally failed request captured by ModelA, as well as 1 from the failed top-level request sent to EnsembleA by the user/client.
119
+
> The `reason=OTHER` reflects that fact that the ensemble doesn't currently capture the specific reason why
120
+
> ModelA's request failed at this time.
121
+
103
122
#### Pending Request Count (Queue Size) Per-Model
104
123
105
124
The *Pending Request Count* reflects the number of requests that have been
Copy file name to clipboardExpand all lines: docs/user_guide/trace.md
+16Lines changed: 16 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -623,6 +623,22 @@ Then, you can specify headers in the `infer` method. For references, please
623
623
look at our [tests](https://github.com/triton-inference-server/server/blob/main/qa/L0_trace/opentelemetry_unittest.py),
624
624
e.g. [http context propagation test](https://github.com/triton-inference-server/server/blob/main/qa/L0_trace/opentelemetry_unittest.py#L494-L508).
625
625
626
+
### Custom Backend Tracing
627
+
628
+
In the case when a custom activity needs to be traced in the backend, please
629
+
use `TRITONSERVER_InferenceTraceReportActivity` API. For examples, please
630
+
refer to the [identity backend](https://github.com/triton-inference-server/identity_backend/blob/main/src/identity.cc).
631
+
632
+
In `openTelemetry` trace mode, if one wishes to start a new span, make sure
633
+
that the name of your custom activity ends with `_START`. To end the new span,
634
+
make sure that corresponding activity ends with `_END`. For example, in the
635
+
identity backend, we start a `CUSTOM_ACTIVITY` span, by [reporting](https://github.com/triton-inference-server/identity_backend/blob/oandreeva-custom-trace-activity/src/identity.cc#L872-L876)
636
+
`CUSTOM_ACTIVITY_START` event; and we close this span by [reporting](https://github.com/triton-inference-server/identity_backend/blob/oandreeva-custom-trace-activity/src/identity.cc#L880-L883)
637
+
`CUSTOM_ACTIVITY_END` event.
638
+
639
+
Please note, that it is user's responsibility to make sure that all custom started
640
+
spans are properly ended.
641
+
626
642
### Limitations
627
643
628
644
- OpenTelemetry trace mode is not supported on Windows systems.
# This test case is used to check whether all the state objects are
471
471
# released when RPC runs into error.
472
+
expected_exceptions= [
473
+
"This protocol is restricted, expecting header 'triton-grpc-protocol-infer-key'",
474
+
"The stream is no longer in valid state, the error detail is reported through provided callback. A new stream should be started after stopping the current stream.",
# This test case is used to check whether all the state objects are
522
529
# released when RPC runs into error.
530
+
expected_exceptions= [
531
+
"This protocol is restricted, expecting header 'triton-grpc-protocol-infer-key'",
532
+
"The stream is no longer in valid state, the error detail is reported through provided callback. A new stream should be started after stopping the current stream.",
0 commit comments