|
| 1 | +<div align="center"> |
| 2 | + |
1 | 3 | # NeMo Eval |
| 4 | + |
| 5 | +[](https://codecov.io/github/NVIDIA-NeMo/Eval) |
| 6 | +[](https://github.com/NVIDIA-NeMo/Eval/actions/workflows/cicd-main.yml) |
| 7 | +[](https://github.com/NVIDIA-NeMo/Eval/blob/main/pyproject.toml) |
| 8 | +[](https://github.com/NVIDIA-NeMo/) |
| 9 | + |
| 10 | +[Documentation](https://nemo-framework-documentation.gitlab-master-pages.nvidia.com/eval-build/) | [Examples](#-usage-examples) | [Contributing](https://github.com/NVIDIA-NeMo/Eval/blob/main/CONTRIBUTING.md) |
| 11 | +</div> |
| 12 | + |
| 13 | +## Overview |
| 14 | + |
| 15 | +**NeMo Eval** is a comprehensive evaluation module under Nemo Framework for Large Language Models (LLMs). It provides seamless deployment and evaluation capabilities for models trained using Nemo Framework via state-of-the-art evaluation harnesses. |
| 16 | + |
| 17 | +## 🚀 Features |
| 18 | + |
| 19 | +- **Multi-Backend Deployment**: Support for both PyTriton and Ray Serve deployment backends |
| 20 | +- **Comprehensive evaluation**: State-of-the-art evaluation harnesses including reasoning benchmarks, code generation, safety testing |
| 21 | +- **Adapter System**: Flexible adapter architecture using a chain of interceptors for customizing request/response processing |
| 22 | +- **Production Ready**: Optimized for high-performance inference with CUDA graphs and flash decoding |
| 23 | +- **Multi-GPU & Multi-Node Support**: Distributed inference across multiple devices and nodes |
| 24 | +- **OpenAI-Compatible API**: RESTful endpoints compatible with OpenAI API standards |
| 25 | + |
| 26 | +## 🔧 Installation |
| 27 | + |
| 28 | +### Prerequisites |
| 29 | + |
| 30 | +- Python 3.10 or higher |
| 31 | +- CUDA-compatible GPU(s) (tested on RTX A6000, A100, H100) |
| 32 | +- NeMo Framework container (recommended) |
| 33 | + |
| 34 | +### Using pip |
| 35 | + |
| 36 | +For quick exploration of NeMo Eval, we recommend installing our pip package: |
| 37 | + |
| 38 | +```bash |
| 39 | +pip install nemo-eval |
| 40 | +``` |
| 41 | + |
| 42 | +### Using Docker |
| 43 | + |
| 44 | +Best experience and highest performance is guaranteed by the [NeMo Framework container](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo/tags). Please fetch the most recent $TAG and run the following command to start a container: |
| 45 | + |
| 46 | +```bash |
| 47 | +docker run --rm -it -w /workdir -v $(pwd):/workdir \ |
| 48 | + --entrypoint bash \ |
| 49 | + --gpus all \ |
| 50 | + nvcr.io/nvidia/nemo:${TAG} |
| 51 | +``` |
| 52 | +### uv |
| 53 | + |
| 54 | +For installing Eval with uv, please refer to our [Contribution guide](https://github.com/NVIDIA-NeMo/Eval/blob/main/CONTRIBUTING.md) |
| 55 | + |
| 56 | +## 🚀 Quick Start |
| 57 | + |
| 58 | +### 1. Deploy a Model |
| 59 | + |
| 60 | +```python |
| 61 | +from nemo_eval.api import deploy |
| 62 | + |
| 63 | +# Deploy a NeMo checkpoint |
| 64 | +deploy( |
| 65 | + nemo_checkpoint="/path/to/your/checkpoint", |
| 66 | + serving_backend="pytriton", # or "ray" |
| 67 | + server_port=8080, |
| 68 | + num_gpus=1, |
| 69 | + max_input_len=4096, |
| 70 | + max_batch_size=8 |
| 71 | +) |
| 72 | +``` |
| 73 | + |
| 74 | +### 2. Evaluate the Model |
| 75 | + |
| 76 | +```python |
| 77 | +from nemo_eval.api import evaluate |
| 78 | +from nemo_eval.utils.api import EvaluationTarget, EvaluationConfig, ApiEndpoint |
| 79 | + |
| 80 | +# Configure evaluation |
| 81 | +api_endpoint = ApiEndpoint( |
| 82 | + url="http://0.0.0.0:8080/v1/completions/", |
| 83 | + model_id="megatron_model" |
| 84 | +) |
| 85 | +target = EvaluationTarget(api_endpoint=api_endpoint) |
| 86 | +config = EvaluationConfig(type="gsm8k") |
| 87 | + |
| 88 | +# Run evaluation |
| 89 | +results = evaluate(target_cfg=target, eval_cfg=config) |
| 90 | +print(results) |
| 91 | +``` |
| 92 | + |
| 93 | +## 📊 Support Matrix |
| 94 | + |
| 95 | +| Checkpoint Type | Inference Backend | Deployment Server | Evaluation Harnesses Supported | |
| 96 | +|----------------|-------------------|-------------|--------------------------| |
| 97 | +| NeMo FW checkpoint via megatron-core backend | Megatron Core in-framework inference engine | PyTriton (single and multi node model parallelism), Ray (single node model parallelism with multi instance evals) | lm-evaluation-harness, simple-evals, BigCode, BFCL, safety-harness, garak | |
| 98 | + |
| 99 | +## 🏗️ Architecture |
| 100 | + |
| 101 | +### Core Components |
| 102 | + |
| 103 | +#### 1. Deployment Layer |
| 104 | + |
| 105 | +- **PyTriton Backend**: High-performance inference using NVIDIA Triton Inference Server and OpenAI API compatibility via FastAPI Interface with model parallelism across single and multi node. Does not support multi instance evaluation. |
| 106 | +- **Ray Backend**: Single node model parallel multi instance evaluation using Ray Serve with OpenAI API compatibility. Multi node support coming soon. |
| 107 | + |
| 108 | +#### 2. Evaluation Layer |
| 109 | + |
| 110 | +- **NVIDIA Eval Factory**: Standardized benchmark evaluation with eval packages from NVIDIA Eval Factory that are installed in the NeMo Framework container. lm-evaluation-harness is installed inside the NeMo Framework container by default while the rest from the [support matrix](#-support-matrix) can be installed on-demand. More details in the [docs](https://github.com/NVIDIA-NeMo/Eval/tree/main/docs). |
| 111 | + |
| 112 | +- **Adapter System**: Flexible request/response processing pipeline with **Interceptors** that provide modular processing |
| 113 | + - **Available Interceptors**: Modular components for request/response processing |
| 114 | + - **SystemMessageInterceptor**: Customize system prompts |
| 115 | + - **RequestLoggingInterceptor**: Log incoming requests |
| 116 | + - **ResponseLoggingInterceptor**: Log outgoing responses |
| 117 | + - **ResponseReasoningInterceptor**: Process reasoning outputs |
| 118 | + - **EndpointInterceptor**: Route requests to the actual model |
| 119 | + |
| 120 | +## 📖 Usage Examples |
| 121 | + |
| 122 | +### Basic Deployment with PyTriton as the serving backend |
| 123 | + |
| 124 | +```python |
| 125 | +from nemo_eval.api import deploy |
| 126 | + |
| 127 | +# Deploy model |
| 128 | +deploy( |
| 129 | + nemo_checkpoint="/path/to/checkpoint", |
| 130 | + serving_backend="pytriton", |
| 131 | + server_port=8080, |
| 132 | + num_gpus=1, |
| 133 | + max_input_len=8192, |
| 134 | + max_batch_size=4 |
| 135 | +) |
| 136 | +``` |
| 137 | + |
| 138 | +### Basic Evaluation |
| 139 | + |
| 140 | +```Python |
| 141 | +from nemo_eval.api import evaluate |
| 142 | +from nemo_eval.utils.api import EvaluationTarget, EvaluationConfig, ApiEndpoint, ConfigParams |
| 143 | +# Configure Endpoint |
| 144 | +api_endpoint = ApiEndpoint( |
| 145 | + url="http://0.0.0.0:8080/v1/completions/", |
| 146 | +) |
| 147 | +# Evaluation target configuration |
| 148 | +target = EvaluationTarget(api_endpoint=api_endpoint) |
| 149 | +# Configure EvaluationConfig with type, num of samples to evaluate on etc., |
| 150 | +config = EvaluationConfig(type="gsm8k", |
| 151 | + params=ConfigParams( |
| 152 | + limit_samples=10 |
| 153 | + )) |
| 154 | + |
| 155 | +# Run evaluation |
| 156 | +results = evaluate(target_cfg=target, eval_cfg=config) |
| 157 | +``` |
| 158 | + |
| 159 | +### Using Adapters |
| 160 | + |
| 161 | +The example below shows how to configure an Adapter that allows to provide a custom system prompt. Requests/responses are processed through interceptors. Interceptors are automatically selected based on the `AdapterConfig` parameters you provide. |
| 162 | + |
| 163 | +```python |
| 164 | +from nemo_eval.utils.api import AdapterConfig |
| 165 | + |
| 166 | +# Configure adapter for reasoning |
| 167 | +adapter_config = AdapterConfig( |
| 168 | + api_url="http://0.0.0.0:8080/v1/completions/", |
| 169 | + use_reasoning=True, |
| 170 | + end_reasoning_token="</think>", |
| 171 | + custom_system_prompt="You are a helpful assistant that thinks step by step.", |
| 172 | + max_logged_requests=5, |
| 173 | + max_logged_responses=5 |
| 174 | +) |
| 175 | + |
| 176 | +# Run evaluation with adapter |
| 177 | +results = evaluate( |
| 178 | + target_cfg=target, |
| 179 | + eval_cfg=config, |
| 180 | + adapter_cfg=adapter_config |
| 181 | +) |
| 182 | +``` |
| 183 | + |
| 184 | +### Multi-GPU Deployment |
| 185 | + |
| 186 | +```python |
| 187 | +# Deploy with tensor parallelism or pipleline parallelism |
| 188 | +deploy( |
| 189 | + nemo_checkpoint="/path/to/checkpoint", |
| 190 | + serving_backend="pytriton", |
| 191 | + num_gpus=4, |
| 192 | + tensor_parallelism_size=4, |
| 193 | + pipeline_parallelism_size=1, |
| 194 | + max_input_len=8192, |
| 195 | + max_batch_size=8 |
| 196 | +) |
| 197 | +``` |
| 198 | + |
| 199 | +### Deploy with Ray |
| 200 | + |
| 201 | +```python |
| 202 | +# Deploy using Ray Serve |
| 203 | +deploy( |
| 204 | + nemo_checkpoint="/path/to/checkpoint", |
| 205 | + serving_backend="ray", |
| 206 | + num_gpus=2, |
| 207 | + num_replicas=2, |
| 208 | + num_cpus_per_replica=8, |
| 209 | + server_port=8080, |
| 210 | + include_dashboard=True, |
| 211 | + cuda_visible_devices="0,1" |
| 212 | +) |
| 213 | +``` |
| 214 | + |
| 215 | +## 📁 Project Structure |
| 216 | + |
| 217 | +``` |
| 218 | +Eval/ |
| 219 | +├── src/nemo_eval/ # Main package |
| 220 | +│ ├── api.py # Main API functions |
| 221 | +│ ├── package_info.py # Package metadata |
| 222 | +│ ├── adapters/ # Adapter system |
| 223 | +│ │ ├── server.py # Adapter server |
| 224 | +│ │ ├── utils.py # Adapter utilities |
| 225 | +│ │ └── interceptors/ # Request/response interceptors |
| 226 | +│ └── utils/ # Utility modules |
| 227 | +│ ├── api.py # API configuration classes |
| 228 | +│ ├── base.py # Base utilities |
| 229 | +│ └── ray_deploy.py # Ray deployment utilities |
| 230 | +├── tests/ # Test suite |
| 231 | +│ ├── unit_tests/ # Unit tests |
| 232 | +│ └── functional_tests/ # Functional tests |
| 233 | +├── tutorials/ # Tutorial notebooks |
| 234 | +├── scripts/ # Reference nemo-run scripts |
| 235 | +├── docs/ # Documentation |
| 236 | +├── docker/ # Docker configuration |
| 237 | +└── external/ # External dependencies |
| 238 | +``` |
| 239 | + |
| 240 | +## 🤝 Contributing |
| 241 | + |
| 242 | +We welcome contributions! Please see our [Contributing Guide](https://github.com/NVIDIA-NeMo/Eval/blob/main/CONTRIBUTING.md) for details on development setup, testing, and code style guidelines |
| 243 | + |
| 244 | +## 📄 License |
| 245 | + |
| 246 | +This project is licensed under the Apache License 2.0. See the [LICENSE](https://github.com/NVIDIA-NeMo/Eval/blob/main/LICENSE) file for details. |
| 247 | + |
| 248 | +## 📞 Support |
| 249 | + |
| 250 | +- **Issues**: [GitHub Issues](https://github.com/NVIDIA-NeMo/Eval/issues) |
| 251 | +- **Discussions**: [GitHub Discussions](https://github.com/NVIDIA-NeMo/Eval/discussions) |
| 252 | +- **Documentation**: [NeMo Documentation](https://nemo-framework-documentation.gitlab-master-pages.nvidia.com/eval-build/) |
| 253 | + |
| 254 | +## 🔗 Related Projects |
| 255 | + |
| 256 | +- [NeMo Export Deploy](https://github.com/NVIDIA-NeMo/Export-Deploy) - Model export and deployment |
| 257 | + |
| 258 | +--- |
| 259 | + |
| 260 | +**Note**: This project is actively maintained by NVIDIA. For the latest updates and features, please check our [releases page](https://github.com/NVIDIA-NeMo/Eval/releases). |
0 commit comments