Evaluating Model Accuracy with lm-eval-harness

Introduction

Course Title: Evaluating Model Accuracy with lm-eval-harness

Description: This course teaches you how to measure the task-level accuracy and reasoning quality of Large Language Models on Red Hat OpenShift AI. You will learn to use lm-eval-harness, an industry-standard benchmarking framework, delivered via the Trusty AI operator. This hands-on lab will guide you through setting up the environment, running standardized evaluations like ARC and MMLU, and interpreting the quantitative results to assess a model's true capabilities.

Duration: 1.5 hours

Objectives

On completing this course, you should be able to:

Enable and configure the Trusty AI operator to run model evaluation jobs.
Configure and launch an LMEvalJob to test a deployed model against standard academic benchmarks.
Retrieve and interpret accuracy results, including raw accuracy, normalized accuracy, and standard error.
Run domain-specific tests to validate a model's knowledge and suitability for specialized enterprise tasks.

Prerequisites

This course assumes that you have the following prior experience:

Basic understanding of Large Language Models (LLMs) and the importance of accuracy validation.
Familiarity with using the OpenShift command-line interface (oc) to interact with a cluster.
Access to a Red Hat OpenShift AI cluster with an available GPU node and a deployed LLM inference service.
Administrative permissions to manage components within the DataScienceCluster resource.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.github/workflows		.github/workflows
.vscode		.vscode
images		images
modules.bfx		modules.bfx
modules		modules
supplemental-ui/partials		supplemental-ui/partials
ui-assets		ui-assets
ui-bundle		ui-bundle
.gitignore		.gitignore
DEVSPACE.md		DEVSPACE.md
README-TRAINING.md		README-TRAINING.md
README.md		README.md
USAGEGUIDE.adoc		USAGEGUIDE.adoc
antora-playbook.yml		antora-playbook.yml
antora.yml		antora.yml
course-init.sh		course-init.sh
create-ui-bundle.sh		create-ui-bundle.sh
devfile.yaml		devfile.yaml
package-lock.json		package-lock.json
package.json		package.json
pdfgen.sh		pdfgen.sh
sample-image.png		sample-image.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Evaluating Model Accuracy with lm-eval-harness

Introduction

Objectives

Prerequisites

About

Uh oh!

Releases

Packages

Languages

RedHatQuickCourses/llmops-lmeval

Folders and files

Latest commit

History

Repository files navigation

Evaluating Model Accuracy with lm-eval-harness

Introduction

Objectives

Prerequisites

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages