Model Serving with Red Hat AI Inference Server

Overview and Objectives

Welcome to Model Serving with Red Hat AI Inference Server (RHAIIS). This is a hands-on lab designed to give you practical experience serving large language models (LLMs) with Red Hat AI Inference Server (RHAIIS).

By the end of this course, you will be able to:

Deploy and serve an LLM for inference using two different Granite models (2B and 8B parameters), gaining firsthand experience with a powerful, enterprise-grade platform.
Optimize GPU resource usage by monitoring memory consumption in real time, understanding how RHAIIS loads model weights and manages the KV Cache. You'll learn how to fine-tune performance by controlling key parameters like max_tokens.
Troubleshoot and solve deployment challenges, specifically by working through the advanced steps needed to successfully launch a larger 8B model. This will build your skills for real-world scenarios.
Lay a foundation for further exploration by using the Red Hat AI Model Repository on Hugging Face to serve and experiment with more models after the lab exercises.

While the lab environment initializes, take this opportunity to review the provided lab guide. It covers essential topics like RHAIIS requirements, supported deployments, and advanced vLLM configuration settings, giving you the context you need to succeed.

Ready to get started? Let’s dive into a powerful platform that helps you deploy AI models with flexibility and high performance across any hybrid cloud environment.

Outcomes

Upon completing this lab, you will be able to:

Deploy AI Inference server for Huggingface based models using Podman.
Verify the model is serving correctly by interacting with its API.
Monitor the GPU's video memory (VRAM) usage in real-time.
Tune server parameters to control memory consumption and context length.
Deploy and test an alternative model to see the platform's flexibility.
Determine the max-model-len for a given model.

Environment Prerequisites

Your lab environment has been pre-configured with the following:

A Red Hat Enterprise Linux 9.x system with a valid subscription.
An attached and configured NVIDIA data center GPU with drivers installed.
Podman and the NVIDIA Container Toolkit are pre-installed.
Credentials for Red Hat account to access registry.redhat.io. (Provided for this experience)
A Hugging Face account with a User Access Token with read permissions. (Provided for this experience)

Setting Up Your RHAIIS Lab Environment

A dedicated lab environment for this training is currently in development. In the meantime, you can use the same environment available on the Demo Platform, https://catalog.demo.redhat.com/catalog?item=babylon-catalog-prod/rhdp.rhaiis-on-rhel.prod&utm_source=webapp&utm_medium=share-link[*Base Red Hat AI Inference Server (RHAIIS)*, window=blank] - which is pre-configured with the Red Hat AI Inference Server (RHAIIS).

This environment includes some pre-configured bonus content:

A bonus lab that shows you how to connect to the AI model using Python.
An Qwen2.5 model running on RHAIIS as a system service. This model starts automatically when the system boots.

Once you have completed the initial exercises, you can stop this service to free up the environment for the this lab's main activities. To stop the service, simply run the following command:

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.github/workflows		.github/workflows
.vscode		.vscode
images		images
modules.bfx		modules.bfx
modules		modules
supplemental-ui/partials		supplemental-ui/partials
ui-assets		ui-assets
ui-bundle		ui-bundle
.gitignore		.gitignore
DEVSPACE.md		DEVSPACE.md
README-TRAINING.md		README-TRAINING.md
README.md		README.md
USAGEGUIDE.adoc		USAGEGUIDE.adoc
antora-playbook.yml		antora-playbook.yml
antora.yml		antora.yml
course-init.sh		course-init.sh
create-ui-bundle.sh		create-ui-bundle.sh
devfile.yaml		devfile.yaml
package-lock.json		package-lock.json
package.json		package.json
pdfgen.sh		pdfgen.sh
sample-image.png		sample-image.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Model Serving with Red Hat AI Inference Server

Overview and Objectives

Outcomes

Environment Prerequisites

Setting Up Your RHAIIS Lab Environment

[source,bash]

sudo systemctl stop rhaiis.service

About

Uh oh!

Releases

Packages

Languages

RedHatQuickCourses/genai-rhaiis

Folders and files

Latest commit

History

Repository files navigation

Model Serving with Red Hat AI Inference Server

Overview and Objectives

Outcomes

Environment Prerequisites

Setting Up Your RHAIIS Lab Environment

[source,bash]

sudo systemctl stop rhaiis.service

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages