EvalAssist

Project Website • Documentation • Video demo

EvalAssist is an LLM-as-a-Judge framework built on top of the Unitxt open source evaluation library for large language models. The EvalAssist application provides users with a convenient way of iteratively testing and refining LLM-as-a-judge criteria, and supports both direct (rubric-based) and pairwise assessment paradigms (relation-based), the two most prevalent forms of LLM-as-a-judge evaluations available. EvalAssist is designed to be model-agnostic, i.e. the content to be evaluated can come from any model. EvalAssist supports a rich set of off-the-shelf judge models that can easily be extended. An API key is required to use the pre-defined judge models. Once users are satisfied with their criteria, they can auto-generate a Notebook with Unitxt code to run bulk evaluations with larger data sets based on their criteria definition. EvalAssist also includes a catalog of example test cases, exhibiting the use of LLM-as-a-judge across a variety of scenarios. Users can save their own test cases.

How to install and run EvalAssist

EvalAssist can be installed using various package managers. Before proceeding, ensure you're using Python >= 3.10, <3.14 to avoid compatibility issues. Make sure to set DATA_DIR to avoid data loss (e.g. export DATA_DIR="~/.eval_assist").

Installation via pip

python3 -m venv venv
source venv/bin/activate # or venv\Scripts\activate.bat in Windows
pip install 'evalassist[webapp]'
eval-assist serve

Installation via uv

uvx --python 3.11 --from 'evalassist[webapp]' eval-assist serve

Installation via conda

conda create -n evalassist python=3.11
conda activate evalassist
pip install 'evalassist[webapp]'
eval-assist serve

In all cases, after running the command, you can access the EvalAssist server at http://localhost:8000.

EvalAssist can be configured through environment variables and command parameters. Take a look at the configuration documentation.

Check out the tutorials to see how to run evaluations and generate synthetic data.

Use Evalassist backend standalone

You can run LLM as a Judge evaluations using Python only. For example:

from evalassist.judges import DirectJudge
from evalassist.judges.const import DEFAULT_JUDGE_INFERENCE_PARAMS
from unitxt.inference import CrossProviderInferenceEngine

judge = DirectJudge(
    inference_engine=CrossProviderInferenceEngine(
        model="llama-3-3-70b-instruct",
        provider="watsonx",
        **DEFAULT_JUDGE_INFERENCE_PARAMS,
    ),
)

results = judge(
    instances=[
        "Use the API client to fetch data from the server and the cache to store frequently accessed results for faster performance."
    ],
    criteria="Is the text self-explanatory and self-contained?",  # Create yes/no direct assessment criteria",
)

Look at the documentation of the judges sub-package.

Contributing

You can contribute to EvalAssist or to Unitxt. Look at the Contribution Guidelines for more details.

Look at the Local Development Guide for instructions on setting up a local development environment.

Documentation

You can find extesive documentation of the system in the Documentation page.

Name		Name	Last commit message	Last commit date
Latest commit History 1,054 Commits
.github/workflows		.github/workflows
backend		backend
cicd		cicd
docs		docs
frontend		frontend
shared/scripts		shared/scripts
.dockerignore		.dockerignore
.env.example		.env.example
.flake8		.flake8
.gitignore		.gitignore
.npmrc		.npmrc
.pre-commit-config.yaml		.pre-commit-config.yaml
.secrets.baseline		.secrets.baseline
.whitesource		.whitesource
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
LOCAL_DEV_GUIDE.MD		LOCAL_DEV_GUIDE.MD
MAINTAINERS.md		MAINTAINERS.md
README.md		README.md
SYSTEM_CONFIGURATION.md		SYSTEM_CONFIGURATION.md
package-lock.json		package-lock.json
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

EvalAssist

How to install and run EvalAssist

Installation via pip

Installation via uv

Installation via conda

Use Evalassist backend standalone

Contributing

Documentation

About

Uh oh!

Releases 28

Packages

Contributors 9

Uh oh!

Languages

License

IBM/eval-assist

Folders and files

Latest commit

History

Repository files navigation

EvalAssist

How to install and run EvalAssist

Installation via pip

Installation via uv

Installation via conda

Use Evalassist backend standalone

Contributing

Documentation

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 28

Packages 0

Contributors 9

Uh oh!

Languages

Packages