You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
**NeMo Eval** is a comprehensive evaluation framework for Large Language Models (LLMs) built on top of the NeMo Framework. It provides seamless deployment and evaluation capabilities for NeMo checkpoints using NVIDIA's evaluation infrastructure.
7
+
**NeMo Eval** is a comprehensive evaluation framework for Large Language Models (LLMs) built on top of the NeMo Framework. It provides seamless deployment and evaluation capabilities for NeMo checkpoints using NVIDIA Eval Factory. NVIDIA Eval Factory contains state-of-the-art evaluation harnesses as modular evaluation packages that are installed in the NeMo Framework container as building blocks for evaluation.
8
8
9
9
## 🚀 Features
10
10
11
11
-**Multi-Backend Deployment**: Support for both PyTriton and Ray Serve deployment backends
12
-
-**Comprehensive Evaluation**: Integration with NVIDIA Evals Factory for standardized benchmark evaluation
13
-
-**Adapter System**: Flexible adapter architecture for customizing request/response processing
12
+
-**Comprehensive Evaluation**: Integration with NVIDIA Eval Factory for standardized benchmark evaluation
13
+
-**Adapter System**: Flexible adapter architecture using a chain of interceptors for customizing request/response processing
14
14
-**Production Ready**: Optimized for high-performance inference with CUDA graphs and flash decoding
15
15
-**Multi-GPU Support**: Distributed inference across multiple GPUs and nodes
16
16
-**OpenAI-Compatible API**: RESTful endpoints compatible with OpenAI API standards
17
-
-**Extensible Architecture**: Plugin-based interceptor system for custom functionality
18
17
19
18
## 📋 Table of Contents
20
19
@@ -33,7 +32,7 @@
33
32
### Prerequisites
34
33
35
34
- Python 3.10 or higher
36
-
- CUDA-compatible GPU(s)
35
+
- CUDA-compatible GPU(s) (tested on RTX A6000, A100, H100)
37
36
- NeMo Framework container (recommended)
38
37
39
38
### Using pip
@@ -42,6 +41,16 @@
42
41
pip install nemo-eval
43
42
```
44
43
44
+
### Using uv
45
+
46
+
```bash
47
+
# Install uv if you haven't already
48
+
pip install uv
49
+
50
+
# Install nemo-eval
51
+
uv pip install nemo-eval
52
+
```
53
+
45
54
### From Source
46
55
47
56
```bash
@@ -50,6 +59,14 @@ cd Eval
50
59
pip install -e .
51
60
```
52
61
62
+
### From Source with uv
63
+
64
+
```bash
65
+
git clone https://github.com/NVIDIA-NeMo/Eval.git
66
+
cd Eval
67
+
uv sync
68
+
```
69
+
53
70
### Using Docker
54
71
55
72
The recommended approach is to use the NeMo Framework container:
| NeMo 2.0 | Megatron Core inference engine | PyTriton (single and multi node model parallelism), Ray (single node model parallelism with multi instance evals) | lm-evaluation-harness, simple-evals, BigCode, BFCL, safety-harness, garak |
122
+
100
123
## 🏗️ Architecture
101
124
102
125
### Core Components
103
126
104
127
#### 1. Deployment Layer
105
-
-**PyTriton Backend**: High-performance inference using NVIDIA Triton Inference Server and OpenAI API compatibility with FastAPI Interface
106
-
-**Ray Backend**: Multi instance or data parallel evaluation using Ray Serve with OpenAI API compatibility
128
+
-**PyTriton Backend**: High-performance inference using NVIDIA Triton Inference Server and OpenAI API compatibility via FastAPI Interface with model parallelism across single and multi node. Does not support multi instance evaluation.
129
+
-**Ray Backend**: Single node model parallel multi instance evaluation using Ray Serve with OpenAI API compatibility. Multi node support coming soon.
107
130
108
131
109
132
#### 2. Evaluation Layer
110
-
-**NVIDIA Evals Factory**: Standardized benchmark evaluation with NVIDIA Evals Factory that provides state-of-the-art evaluation harnesses like lm-evaluation-harness, simple-evals, BigCode, BFCL, safety-harness, garak as modular evaluation packages compatible for installation within in the NeMo Framework container. lm-evaluation-harness is installed inside the NeMo Framework container while the others can be installed on-demand. More details in the [docs](https://github.com/NVIDIA-NeMo/Eval/tree/main/docs).
133
+
-**NVIDIA Eval Factory**: Standardized benchmark evaluation with eval packages from NVIDIA Eval Factory that are installed in the NeMo Framework container. lm-evaluation-harness is installed inside the NeMo Framework container by default while the rest from the [support matrix](#-support-matrix) can be installed on-demand. More details in the [docs](https://github.com/NVIDIA-NeMo/Eval/tree/main/docs).
111
134
-**Adapter System**: Flexible request/response processing pipeline with **Interceptors** that provide modular processing
The example below shows how to configure an Adapter that allows to provide a custom system prompt. Requests/responses are processed through interceptors. Interceptors are automatically selected based on the `AdapterConfig` parameters you provide.
0 commit comments