Welcome to the vLLM Function Calling Quickstart!
Use this to quickly get a vLLM runtime with Function Calling enabled in your OpenShift AI environment, loading models directly from ModelCar containers.
To see how it's done, jump straight to installation.
The vLLM Function Calling Quickstart is a template for deploying vLLM with Function Calling enabled, integrated with ModelCar containerized models, within Red Hat OpenShift AI.
It’s designed for environments where you want to:
- Enable LLMs to call external tools (Tool/Function Calling).
- Serve LLMs (like Granite3, Llama3) directly from a container.
- Easily customize your model deployments without needing cluster admin privileges.
Use this project to quickly spin up a powerful vLLM instance ready for function-enabled Agents or AI applications.
Red Hat uses Arcade software to create interactive demos. Check out Function Calling Quickstart Example to see it live.
- The runtime is out of the box in RHOAI called vLLM ServingRuntime for KServe
- Detailed guide and documentations is available in this article.
- Code for testing the Function Calling in OpenShift AI is in github.com/rh-aiservices-bu/llm-on-openshift
NOTE: To find more patterns and pre-built ModelCar images, take a look at the Red Hat AI Services ModelCar Catalog repo on GitHub and the ModelCar Catalog registry on Quay.
- 8+ vCPUs with 4th Gen Intel® Xeon® Scalable processors or newer
- 24+ GiB RAM
- Storage: 30Gi minimum in PVC (larger models may require more)
- 1 GPU (NVIDIA L40, A10, or similar)
- 1 Intel® Gaudi® AI Accelerator
- Red Hat OpenShift
- Red Hat OpenShift AI 2.16+
- Dependencies for Single-model server:
- Red Hat OpenShift Service Mesh
- Red Hat OpenShift Serverless
- Standard user. No elevated cluster permissions required
Please note before you start
This example was tested on Red Hat OpenShift 4.16.24 & Red Hat OpenShift AI v2.16.2.
git clone https://github.com/rh-ai-quickstart/vllm-tool-calling.git && \
cd vllm-tool-calling/
PROJECT can be set to any value. This will also be used as the namespace.
export PROJECT="vllm-tool-calling-demo"
oc new-project $PROJECTSpecify your LLM and device:
- MODEL: select from [granite3.2-8b, llama3.2-1b, llama3.2-3b]
- DEVICE: select from [cpu, gpu, hpu]
Set variables to the selected options. Example is shown below.
export MODEL="granite3.2-8b"
export DEVICE="gpu"Deploy the LLM on the target hardware:
oc apply -n $PROJECT -k vllm-tool-calling/$MODEL/$DEVICE-
From the OpenShift Console, go to the App Switcher / Waffle and go to the Red Hat OpenShift AI Console.
-
Once inside the dashboard, navigate to Data Science Projects -> vllm-tool-calling-demo (or what you called your ${PROJECT} if you changed from default):
- Check the models deployed, and wait until you get the green tick in the Status, meaning that the model is deployed successfully:
To remove all deployed components:
oc delete -n $PROJECT -f vllm-tool-calling/$MODEL/$DEVICEDelete the project:
oc delete project $PROJECT

