LiteLLM and LlamaStack Integration

Overview

This repository explores integrating LiteLLM with LlamaStack in the context of Red Hat OpenShift AI (RHOAI). Given LlamaStack's widespread adoption across RHOAI repositories, our goal is to evaluate LiteLLM's features and assess how it can be utilized within Red Hat's infrastructure, including Red Hat AI.

Definitions

LiteLLM

LiteLLM Official Documentation

Note

LiteLLM is an open-source library and gateway that provides a unified interface for over 100 Large Language Model (LLM) APIs. It allows developers to call models from providers like OpenAI, Anthropic, Azure, and Google using a single, consistent OpenAI-like format. Acting as a universal adapter, it simplifies integration, enables model swapping, and offers features like cost tracking, load balancing, and rate limiting for production applications.

LiteLLM is an open-source platform that provides a unified interface to manage and access over 100 LLMs from various providers (Arize). It focuses primarily on model access and API translation.

LlamaStack

LlamaStack Official Documentation

Note

Llama Stack defines and standardizes the core building blocks needed to bring generative AI applications to market. It provides a unified set of APIs with implementations from leading service providers, enabling seamless transitions between development and production environments.

Key features include:

Unified API layer for Inference, RAG, Agents, Tools, Safety, Evals, and Telemetry
Plugin architecture supporting implementations across local development, on-premises, cloud, and mobile environments
Prepackaged verified distributions for quick and reliable setup in any environment
Multiple developer interfaces including CLI and SDKs for Python, Node, iOS, and Android
Standalone applications as examples for building production-grade AI applications

LlamaStack is an open-source framework for building generative AI applications with unified APIs for Inference, RAG, Agents, Tools, Safety, and Telemetry (LlamaStack). It serves as a comprehensive application framework.

Similarities

Both tools aim to simplify working with large language models by providing abstraction layers, though they approach this goal differently:

Aspect	Description
Unified Interfaces	Both provide standardized APIs to work with multiple LLM providers, reducing the need to learn provider-specific APIs
Multi-Provider Support	Both support numerous LLM providers including OpenAI, Anthropic, Azure, and others
Developer-Friendly	Both are open-source projects designed to streamline LLM application development
Python Ecosystem	Both offer Python SDKs as a core part of their offerings

LiteLLM with LlamaStack

Based on our analysis, LlamaStack and LiteLLM have overlapping capabilities—especially in providing a unified interface to various LLM providers. However, each solution brings unique strengths:

LlamaStack excels as a comprehensive framework for building advanced generative AI applications, offering RAG, safety guardrails, memory management, and agentic systems.
LiteLLM excels as a lightweight gateway for accessing and managing LLM APIs, with strong features for cost tracking, rate limiting, key management, and operational controls.

By integrating LiteLLM with LlamaStack, you get the best of both worlds: LlamaStack's advanced application-building tools combined with LiteLLM's operational advantages. Rather than competing, these tools complement each other—helping you build, deploy, and manage AI-powered applications more efficiently and securely.

Feature Comparison

Feature	LiteLLM	LlamaStack
API Key Management	✓	✗
Rate Limiting	✓	✗
Load Balancing	✓	✗
Caching	✓	✗
Fallback/Retry Logic	✓	✗
Cost Tracking	✓	✗
Telemetry/Monitoring	✓	✓
Multi-Provider Gateway	✓	✓
Agent/Agentic System	✗	✓
RAG (Built-in)	✗	✓
Memory Management	✗	✓
Safety Guardrails	✗	✓
Tool/Function Calling	✗	✓
Prompt Guard	✗	✓
Evaluation Framework	✗	✓
Vector Store Integration	✗	✓
Multi-turn Conversations	✗	✓
Mobile SDK Support	✗	✓

Architecture

The following diagram illustrates the recommended configuration when using LlamaStack and LiteLLM together:

---
config:
  layout: dagre
---
flowchart TB
 subgraph CLIENTS["Clients"]
        UI["Chat"]
        API["API"]
        AGENTS["Agents"]
  end
 subgraph LLAMASTACK["LlamaStack"]
    direction TB
        LS_API["LlamaStack API"]
        LS_WORKFLOWS["Agents"]
        LS_TOOLS["Tool Execution"]
        LS_RAG["RAG Pipelines"]
        LS_MEMORY["Memory/Knowledge"]
  end
 subgraph GATEWAY["LiteLLM"]
    direction TB
        LLM_API["OpenAI-Compatible API"]
        ROUTING["Model Routing & Fallback"]
        RATELIMITS["Rate Limits / Quotas"]
        RETRIES["Retries & Timeouts"]
        CACHE["Optional Response Cache"]
        MULTI_TENENCY["User/API Management"]
  end
 subgraph VLLM["vLLM Cluster"]
        VLLM_API["vLLM API Server"]
        VLLM_GPU["GPU Nodes"]
  end
 subgraph LLM_D["TGI Cluster"]
        LLM_D_API["TGI API Server"]
        LLM_D_GPU["GPU Nodes"]
  end
 subgraph CLOUD["Hosted Providers"]
        OPENAI["OpenAI / Azure OpenAI"]
        BEDROCK["AWS Bedrock"]
  end
 subgraph INFERENCE["Inference Backend"]
    direction TB
        VLLM
        LLM_D
        CLOUD
  end
 subgraph STORAGE["Storage"]
        PG["Postgres"]
        VECTOR["Vector DB"]
  end
    UI --> LS_API
    API --> LS_API
    AGENTS --> LS_API
    LS_API --> LS_WORKFLOWS
    LS_WORKFLOWS --> LS_TOOLS & LS_RAG & LS_MEMORY & LLM_API
    LS_RAG --> VECTOR
    LS_MEMORY --> PG
    LLM_API --> ROUTING
    ROUTING --> RATELIMITS & VLLM_API & LLM_D_API & OPENAI & BEDROCK
    RATELIMITS --> RETRIES
    RETRIES --> CACHE
    VLLM_API --> VLLM_GPU
    LLM_D_API --> LLM_D_GPU

     LS_API:::cpu
     LS_WORKFLOWS:::cpu
     LS_TOOLS:::cpu
     LS_RAG:::cpu
     LS_MEMORY:::cpu
     LLM_API:::cpu
     ROUTING:::cpu
     RATELIMITS:::cpu
     RETRIES:::cpu
     CACHE:::cpu
     VLLM_API:::control
     VLLM_GPU:::gpu
     OPENAI:::control
     BEDROCK:::control
    classDef cpu fill:#E3F2FD,stroke:#1E88E5,stroke-width:1px
    classDef gpu fill:#FCE4EC,stroke:#C2185B,stroke-width:1px
    classDef control fill:#E8F5E9,stroke:#2E7D32,stroke-width:1px

Getting Started

Prerequisites

To run LiteLLM with LlamaStack, you must have an LLM serving solution or hosted provider available for inference. Options include:

Self-hosted: Follow the RHOAI Model Serving Tutorial to serve an LLM using Red Hat OpenShift AI
Hosted providers: Use services like Anthropic, OpenAI, Google, or AWS Bedrock
Local development: Run Ollama for local testing

Deployment

Navigate to the deploy directory:
```
cd deploy
```
Run the installation:
```
make install NAMESPACE=<your_namespace>
```
Wait for the deployment to complete.
Follow the demo documentation to run specific demos.

You can use the UI application to interface with LiteLLM directly by navigating to the UI route in OpenShift, or run the demos directly.

Demos

By using the configuration above, you can leverage both technologies to achieve an enterprise-level experience. Navigate to the /demos folder to see them in action. All demos are deployable to Red Hat's OpenShift environment.

Demo	Description
Budgeting Demo	Demonstrates budget management and cost tracking
LLM Failover Demo	Shows automatic failover between LLM providers
LlamaStack Integration	Demonstrates LlamaStack and LiteLLM working together

Chat Interface

In addition to the demos, we provide a Streamlit-based chat interface for interacting with the LiteLLM gateway. This application allows users to configure LiteLLM settings via the Admin UI and observe the effects in real-time through the chat interface.

What Gets Deployed

The demo project deploys the following components:

Component	Purpose
LiteLLM API	Powers the chat applications (inference, budgeting, rate limiting)
LiteLLM Admin UI	Configuration and management interface for LiteLLM
Postgres	Database storage for LiteLLM
LlamaStack	Unified AI API (Agents, RAG, Knowledge)

Seed Data

The Helm deployment includes a job that loads sample data into LiteLLM. The deploy/helm/templates/seed-job.yaml file defines the job that creates users and teams for the demo.

Configuration values are defined in values.yaml:

seed:
  enabled: true
  teams:
    - team_alias: engineering
      max_budget: 100.0
    - team_alias: marketing
      max_budget: 50.0
  users:
    - user_email: eng-user@example.com
      team_alias: engineering
    - user_email: mkt-user@example.com
      team_alias: marketing

This creates two teams and two users with separate team budgets.

Note

See the LiteLLM Overview for detailed information on running and configuring the demo application.

Conclusion

LiteLLM and LlamaStack are complementary technologies that, when combined, provide a robust foundation for building enterprise-grade AI applications:

LlamaStack delivers application-building capabilities: RAG, agents, safety guardrails, and tool execution
LiteLLM adds operational controls: rate limiting, cost tracking, load balancing, and API key management

For organizations using Red Hat OpenShift AI, this integration offers a path to deploy scalable, secure, and manageable AI solutions. The demos in this repository demonstrate how these tools work together in practice, providing a starting point for teams looking to leverage both technologies within Red Hat's infrastructure.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.github/workflows		.github/workflows
apps/ui		apps/ui
demos		demos
deploy		deploy
docs		docs
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
litellm_overview.md		litellm_overview.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LiteLLM and LlamaStack Integration

Overview

Definitions

LiteLLM

LlamaStack

Similarities

LiteLLM with LlamaStack

Feature Comparison

Architecture

Getting Started

Prerequisites

Deployment

Demos

Chat Interface

What Gets Deployed

Seed Data

Conclusion

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

License

RHEcosystemAppEng/rhoai-litellm-poc

Folders and files

Latest commit

History

Repository files navigation

LiteLLM and LlamaStack Integration

Overview

Definitions

LiteLLM

LlamaStack

Similarities

LiteLLM with LlamaStack

Feature Comparison

Architecture

Getting Started

Prerequisites

Deployment

Demos

Chat Interface

What Gets Deployed

Seed Data

Conclusion

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages