Skip to content

[Feature request] Adding a checker to see if a custom endpoint is working properlyΒ #106

@remyleone

Description

@remyleone

I'm trying to run a model using the following command on my server:

docker run --gpus all --shm-size 1g -p 8080:80 -v /scratch/data:/data -e HUGGING_FACE_HUB_TOKEN=$HF_TOKEN -e HF_HUB_ENA^CE_HF_TRANSFER=0 ghcr.io/huggingface/text-generation-inference:1.1.0 --model-id bigcode/starcoder

But when I configure the IP in my editor: http://XXXX:8080/generate I would like to have a test from the editor that tell me whether or not the editor can successfully connect.

It could be helpful in the settings or as a dedicated command from the LLM extensions to verify that everything is in place and have helpful messages in case it is not.

As a check from the server side, I'm using nvidia-smi to see if additional GPU usage is happening

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions