|
| 1 | +<!-- |
| 2 | +# Copyright 2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved. |
| 3 | +# |
| 4 | +# Redistribution and use in source and binary forms, with or without |
| 5 | +# modification, are permitted provided that the following conditions |
| 6 | +# are met: |
| 7 | +# * Redistributions of source code must retain the above copyright |
| 8 | +# notice, this list of conditions and the following disclaimer. |
| 9 | +# * Redistributions in binary form must reproduce the above copyright |
| 10 | +# notice, this list of conditions and the following disclaimer in the |
| 11 | +# documentation and/or other materials provided with the distribution. |
| 12 | +# * Neither the name of NVIDIA CORPORATION nor the names of its |
| 13 | +# contributors may be used to endorse or promote products derived |
| 14 | +# from this software without specific prior written permission. |
| 15 | +# |
| 16 | +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS ``AS IS'' AND ANY |
| 17 | +# EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE |
| 18 | +# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR |
| 19 | +# PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR |
| 20 | +# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, |
| 21 | +# EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, |
| 22 | +# PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR |
| 23 | +# PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY |
| 24 | +# OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT |
| 25 | +# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE |
| 26 | +# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. |
| 27 | +--> |
| 28 | + |
| 29 | +# vLLM Health Check (BETA) |
| 30 | + |
| 31 | +> [!NOTE] |
| 32 | +> The vLLM Health Check support is currently in BETA. Its features and |
| 33 | +> functionality are subject to change as we collect feedback. We are excited to |
| 34 | +> hear any thoughts you have! |
| 35 | +
|
| 36 | +The vLLM backend supports checking for |
| 37 | +[vLLM Engine Health](https://github.com/vllm-project/vllm/blob/v0.6.3.post1/vllm/engine/async_llm_engine.py#L1177-L1185) |
| 38 | +upon receiving each inference request. If the health check fails, the entire |
| 39 | +model will be unloaded, so its state becomes NOT Ready at the server, which can |
| 40 | +be queried by the |
| 41 | +[Repository Index](https://github.com/triton-inference-server/server/blob/main/docs/protocol/extension_model_repository.md#index) |
| 42 | +or |
| 43 | +[Model Ready](https://github.com/triton-inference-server/client/blob/main/src/c%2B%2B/library/http_client.h#L178-L192) |
| 44 | +APIs. |
| 45 | + |
| 46 | +The Health Check is disabled by default. To enable it, set the following |
| 47 | +parameter on the model config to true |
| 48 | +``` |
| 49 | +parameters: { |
| 50 | + key: "ENABLE_VLLM_HEALTH_CHECK" |
| 51 | + value: { string_value: "true" } |
| 52 | +} |
| 53 | +``` |
| 54 | +and select |
| 55 | +[Model Control Mode EXPLICIT](https://github.com/triton-inference-server/server/blob/main/docs/user_guide/model_management.md#model-control-mode-explicit) |
| 56 | +when the server is started. |
| 57 | + |
| 58 | +Supported since r24.12. |
0 commit comments