Skip to content

AI quickstart for deploying an LLM with Tool Calling enabled on top of OpenShift AI

License

Notifications You must be signed in to change notification settings

rh-ai-quickstart/vllm-tool-calling

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

56 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

vLLM Tool Calling

Ready for publish Ready for publish

Welcome to the vLLM Function Calling Quickstart!

Use this to quickly get a vLLM runtime with Function Calling enabled in your OpenShift AI environment, loading models directly from ModelCar containers.

To see how it's done, jump straight to installation.

Table of Contents

Detailed description

The vLLM Function Calling Quickstart is a template for deploying vLLM with Function Calling enabled, integrated with ModelCar containerized models, within Red Hat OpenShift AI.

It’s designed for environments where you want to:

  • Enable LLMs to call external tools (Tool/Function Calling).
  • Serve LLMs (like Granite3, Llama3) directly from a container.
  • Easily customize your model deployments without needing cluster admin privileges.

Use this project to quickly spin up a powerful vLLM instance ready for function-enabled Agents or AI applications.

See it in action

Red Hat uses Arcade software to create interactive demos. Check out Function Calling Quickstart Example to see it live.

Architecture diagrams

architecture.png

References

NOTE: To find more patterns and pre-built ModelCar images, take a look at the Red Hat AI Services ModelCar Catalog repo on GitHub and the ModelCar Catalog registry on Quay.

Requirements

Minimum hardware requirements

  • 8+ vCPUs with 4th Gen Intel® Xeon® Scalable processors or newer
  • 24+ GiB RAM
  • Storage: 30Gi minimum in PVC (larger models may require more)

Optional, depending on selected hardware platform

  • 1 GPU (NVIDIA L40, A10, or similar)
  • 1 Intel® Gaudi® AI Accelerator

Required software

  • Red Hat OpenShift
  • Red Hat OpenShift AI 2.16+
  • Dependencies for Single-model server:
    • Red Hat OpenShift Service Mesh
    • Red Hat OpenShift Serverless

Required permissions

  • Standard user. No elevated cluster permissions required

Install

Please note before you start

This example was tested on Red Hat OpenShift 4.16.24 & Red Hat OpenShift AI v2.16.2.

Clone the repository

git clone https://github.com/rh-ai-quickstart/vllm-tool-calling.git && \
    cd vllm-tool-calling/  

Create the project

PROJECT can be set to any value. This will also be used as the namespace.

export PROJECT="vllm-tool-calling-demo"

oc new-project $PROJECT

Specify your LLM and device:

Set variables to the selected options. Example is shown below.

export MODEL="granite3.2-8b"
export DEVICE="gpu"

Deploy the LLM on the target hardware:

oc apply -n $PROJECT -k vllm-tool-calling/$MODEL/$DEVICE

Check the deployment

  • From the OpenShift Console, go to the App Switcher / Waffle and go to the Red Hat OpenShift AI Console.

  • Once inside the dashboard, navigate to Data Science Projects -> vllm-tool-calling-demo (or what you called your ${PROJECT} if you changed from default):

OpenShift AI Projects

  • Check the models deployed, and wait until you get the green tick in the Status, meaning that the model is deployed successfully:

OpenShift AI Projects

Cleanup

To remove all deployed components:

oc delete -n $PROJECT -f vllm-tool-calling/$MODEL/$DEVICE

Delete the project:

oc delete project $PROJECT

About

AI quickstart for deploying an LLM with Tool Calling enabled on top of OpenShift AI

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •