Skip to content

Commit 626cc15

Browse files
athittenko3n1gsnowmanwwg
authored
Add comprehensive README.md (#12)
Signed-off-by: Abhishree <abhishreetm@gmail.com> Signed-off-by: oliver könig <okoenig@nvidia.com> Co-authored-by: oliver könig <okoenig@nvidia.com> Co-authored-by: Wenwen Gao <94138584+snowmanwwg@users.noreply.github.com>
1 parent 0cedf63 commit 626cc15

File tree

2 files changed

+312
-0
lines changed

2 files changed

+312
-0
lines changed

CONTRIBUTING.md

Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -64,6 +64,56 @@ git commit -m "build: Adding dependencies"
6464
git push
6565
```
6666

67+
## Development Setup
68+
69+
1. Fork or clone the repository
70+
2. Create a feature branch
71+
3. Install development dependencies [using uv](#local-workstation) or [using pip](https://github.com/NVIDIA-NeMo/Eval/blob/main/README.md#using-pip) if outside of [docker](#alternative-development-container)
72+
4. Run pre-commit hooks:
73+
```bash
74+
pre-commit install
75+
```
76+
5. Make your changes and add tests
77+
6. Submit a pull request
78+
79+
### Testing
80+
81+
#### Running Tests
82+
83+
```bash
84+
# Run all tests
85+
pytest tests
86+
87+
# Run unit tests only
88+
pytest tests/unit_tests/
89+
90+
# Run functional tests only
91+
pytest tests/functional_tests/
92+
93+
# Run with coverage
94+
pytest --cov=nemo_eval tests
95+
```
96+
97+
#### Test Scripts
98+
99+
```bash
100+
# Unit tests on CPU
101+
bash tests/unit_tests/L0_Unit_Tests_CPU.sh
102+
103+
# Unit tests on GPU
104+
bash tests/unit_tests/L0_Unit_Tests_GPU.sh
105+
106+
# Functional tests on GPU
107+
bash tests/functional_tests/L2_Functional_Tests_GPU.sh
108+
```
109+
110+
#### Testing Guidelines
111+
112+
- Write unit tests and functional tests for new functionality
113+
- Ensure all tests pass before submitting
114+
- Add integration tests for complex features
115+
- Follow existing test patterns
116+
67117
### 🧹 Linting and Formatting
68118

69119
We use [ruff](https://docs.astral.sh/ruff/) for linting and formatting. CI does not auto-fix linting and formatting issues, but most issues can be fixed by running the following command:
@@ -156,3 +206,6 @@ uv run --only-group docs sphinx-autobuild . _build/html
156206
maintained indefinitely and may be redistributed consistent with
157207
this project or the open source license(s) involved.
158208
```
209+
210+
211+

README.md

Lines changed: 259 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1 +1,260 @@
1+
<div align="center">
2+
13
# NeMo Eval
4+
5+
[![codecov](https://codecov.io/github/NVIDIA-NeMo/Eval/graph/badge.svg?token=4NMKZVOW2Z)](https://codecov.io/github/NVIDIA-NeMo/Eval)
6+
[![CICD NeMo](https://github.com/NVIDIA-NeMo/Eval/actions/workflows/cicd-main.yml/badge.svg)](https://github.com/NVIDIA-NeMo/Eval/actions/workflows/cicd-main.yml)
7+
[![Python](https://img.shields.io/badge/python-3.10+-blue.svg)](https://github.com/NVIDIA-NeMo/Eval/blob/main/pyproject.toml)
8+
[![NVIDIA](https://img.shields.io/badge/NVIDIA-NeMo-red.svg)](https://github.com/NVIDIA-NeMo/)
9+
10+
[Documentation](https://nemo-framework-documentation.gitlab-master-pages.nvidia.com/eval-build/) | [Examples](#-usage-examples) | [Contributing](https://github.com/NVIDIA-NeMo/Eval/blob/main/CONTRIBUTING.md)
11+
</div>
12+
13+
## Overview
14+
15+
**NeMo Eval** is a comprehensive evaluation module under Nemo Framework for Large Language Models (LLMs). It provides seamless deployment and evaluation capabilities for models trained using Nemo Framework via state-of-the-art evaluation harnesses.
16+
17+
## 🚀 Features
18+
19+
- **Multi-Backend Deployment**: Support for both PyTriton and Ray Serve deployment backends
20+
- **Comprehensive evaluation**: State-of-the-art evaluation harnesses including reasoning benchmarks, code generation, safety testing
21+
- **Adapter System**: Flexible adapter architecture using a chain of interceptors for customizing request/response processing
22+
- **Production Ready**: Optimized for high-performance inference with CUDA graphs and flash decoding
23+
- **Multi-GPU & Multi-Node Support**: Distributed inference across multiple devices and nodes
24+
- **OpenAI-Compatible API**: RESTful endpoints compatible with OpenAI API standards
25+
26+
## 🔧 Installation
27+
28+
### Prerequisites
29+
30+
- Python 3.10 or higher
31+
- CUDA-compatible GPU(s) (tested on RTX A6000, A100, H100)
32+
- NeMo Framework container (recommended)
33+
34+
### Using pip
35+
36+
For quick exploration of NeMo Eval, we recommend installing our pip package:
37+
38+
```bash
39+
pip install nemo-eval
40+
```
41+
42+
### Using Docker
43+
44+
Best experience and highest performance is guaranteed by the [NeMo Framework container](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo/tags). Please fetch the most recent $TAG and run the following command to start a container:
45+
46+
```bash
47+
docker run --rm -it -w /workdir -v $(pwd):/workdir \
48+
--entrypoint bash \
49+
--gpus all \
50+
nvcr.io/nvidia/nemo:${TAG}
51+
```
52+
### uv
53+
54+
For installing Eval with uv, please refer to our [Contribution guide](https://github.com/NVIDIA-NeMo/Eval/blob/main/CONTRIBUTING.md)
55+
56+
## 🚀 Quick Start
57+
58+
### 1. Deploy a Model
59+
60+
```python
61+
from nemo_eval.api import deploy
62+
63+
# Deploy a NeMo checkpoint
64+
deploy(
65+
nemo_checkpoint="/path/to/your/checkpoint",
66+
serving_backend="pytriton", # or "ray"
67+
server_port=8080,
68+
num_gpus=1,
69+
max_input_len=4096,
70+
max_batch_size=8
71+
)
72+
```
73+
74+
### 2. Evaluate the Model
75+
76+
```python
77+
from nemo_eval.api import evaluate
78+
from nemo_eval.utils.api import EvaluationTarget, EvaluationConfig, ApiEndpoint
79+
80+
# Configure evaluation
81+
api_endpoint = ApiEndpoint(
82+
url="http://0.0.0.0:8080/v1/completions/",
83+
model_id="megatron_model"
84+
)
85+
target = EvaluationTarget(api_endpoint=api_endpoint)
86+
config = EvaluationConfig(type="gsm8k")
87+
88+
# Run evaluation
89+
results = evaluate(target_cfg=target, eval_cfg=config)
90+
print(results)
91+
```
92+
93+
## 📊 Support Matrix
94+
95+
| Checkpoint Type | Inference Backend | Deployment Server | Evaluation Harnesses Supported |
96+
|----------------|-------------------|-------------|--------------------------|
97+
| NeMo FW checkpoint via megatron-core backend | Megatron Core in-framework inference engine | PyTriton (single and multi node model parallelism), Ray (single node model parallelism with multi instance evals) | lm-evaluation-harness, simple-evals, BigCode, BFCL, safety-harness, garak |
98+
99+
## 🏗️ Architecture
100+
101+
### Core Components
102+
103+
#### 1. Deployment Layer
104+
105+
- **PyTriton Backend**: High-performance inference using NVIDIA Triton Inference Server and OpenAI API compatibility via FastAPI Interface with model parallelism across single and multi node. Does not support multi instance evaluation.
106+
- **Ray Backend**: Single node model parallel multi instance evaluation using Ray Serve with OpenAI API compatibility. Multi node support coming soon.
107+
108+
#### 2. Evaluation Layer
109+
110+
- **NVIDIA Eval Factory**: Standardized benchmark evaluation with eval packages from NVIDIA Eval Factory that are installed in the NeMo Framework container. lm-evaluation-harness is installed inside the NeMo Framework container by default while the rest from the [support matrix](#-support-matrix) can be installed on-demand. More details in the [docs](https://github.com/NVIDIA-NeMo/Eval/tree/main/docs).
111+
112+
- **Adapter System**: Flexible request/response processing pipeline with **Interceptors** that provide modular processing
113+
- **Available Interceptors**: Modular components for request/response processing
114+
- **SystemMessageInterceptor**: Customize system prompts
115+
- **RequestLoggingInterceptor**: Log incoming requests
116+
- **ResponseLoggingInterceptor**: Log outgoing responses
117+
- **ResponseReasoningInterceptor**: Process reasoning outputs
118+
- **EndpointInterceptor**: Route requests to the actual model
119+
120+
## 📖 Usage Examples
121+
122+
### Basic Deployment with PyTriton as the serving backend
123+
124+
```python
125+
from nemo_eval.api import deploy
126+
127+
# Deploy model
128+
deploy(
129+
nemo_checkpoint="/path/to/checkpoint",
130+
serving_backend="pytriton",
131+
server_port=8080,
132+
num_gpus=1,
133+
max_input_len=8192,
134+
max_batch_size=4
135+
)
136+
```
137+
138+
### Basic Evaluation
139+
140+
```Python
141+
from nemo_eval.api import evaluate
142+
from nemo_eval.utils.api import EvaluationTarget, EvaluationConfig, ApiEndpoint, ConfigParams
143+
# Configure Endpoint
144+
api_endpoint = ApiEndpoint(
145+
url="http://0.0.0.0:8080/v1/completions/",
146+
)
147+
# Evaluation target configuration
148+
target = EvaluationTarget(api_endpoint=api_endpoint)
149+
# Configure EvaluationConfig with type, num of samples to evaluate on etc.,
150+
config = EvaluationConfig(type="gsm8k",
151+
params=ConfigParams(
152+
limit_samples=10
153+
))
154+
155+
# Run evaluation
156+
results = evaluate(target_cfg=target, eval_cfg=config)
157+
```
158+
159+
### Using Adapters
160+
161+
The example below shows how to configure an Adapter that allows to provide a custom system prompt. Requests/responses are processed through interceptors. Interceptors are automatically selected based on the `AdapterConfig` parameters you provide.
162+
163+
```python
164+
from nemo_eval.utils.api import AdapterConfig
165+
166+
# Configure adapter for reasoning
167+
adapter_config = AdapterConfig(
168+
api_url="http://0.0.0.0:8080/v1/completions/",
169+
use_reasoning=True,
170+
end_reasoning_token="</think>",
171+
custom_system_prompt="You are a helpful assistant that thinks step by step.",
172+
max_logged_requests=5,
173+
max_logged_responses=5
174+
)
175+
176+
# Run evaluation with adapter
177+
results = evaluate(
178+
target_cfg=target,
179+
eval_cfg=config,
180+
adapter_cfg=adapter_config
181+
)
182+
```
183+
184+
### Multi-GPU Deployment
185+
186+
```python
187+
# Deploy with tensor parallelism or pipleline parallelism
188+
deploy(
189+
nemo_checkpoint="/path/to/checkpoint",
190+
serving_backend="pytriton",
191+
num_gpus=4,
192+
tensor_parallelism_size=4,
193+
pipeline_parallelism_size=1,
194+
max_input_len=8192,
195+
max_batch_size=8
196+
)
197+
```
198+
199+
### Deploy with Ray
200+
201+
```python
202+
# Deploy using Ray Serve
203+
deploy(
204+
nemo_checkpoint="/path/to/checkpoint",
205+
serving_backend="ray",
206+
num_gpus=2,
207+
num_replicas=2,
208+
num_cpus_per_replica=8,
209+
server_port=8080,
210+
include_dashboard=True,
211+
cuda_visible_devices="0,1"
212+
)
213+
```
214+
215+
## 📁 Project Structure
216+
217+
```
218+
Eval/
219+
├── src/nemo_eval/ # Main package
220+
│ ├── api.py # Main API functions
221+
│ ├── package_info.py # Package metadata
222+
│ ├── adapters/ # Adapter system
223+
│ │ ├── server.py # Adapter server
224+
│ │ ├── utils.py # Adapter utilities
225+
│ │ └── interceptors/ # Request/response interceptors
226+
│ └── utils/ # Utility modules
227+
│ ├── api.py # API configuration classes
228+
│ ├── base.py # Base utilities
229+
│ └── ray_deploy.py # Ray deployment utilities
230+
├── tests/ # Test suite
231+
│ ├── unit_tests/ # Unit tests
232+
│ └── functional_tests/ # Functional tests
233+
├── tutorials/ # Tutorial notebooks
234+
├── scripts/ # Reference nemo-run scripts
235+
├── docs/ # Documentation
236+
├── docker/ # Docker configuration
237+
└── external/ # External dependencies
238+
```
239+
240+
## 🤝 Contributing
241+
242+
We welcome contributions! Please see our [Contributing Guide](https://github.com/NVIDIA-NeMo/Eval/blob/main/CONTRIBUTING.md) for details on development setup, testing, and code style guidelines
243+
244+
## 📄 License
245+
246+
This project is licensed under the Apache License 2.0. See the [LICENSE](https://github.com/NVIDIA-NeMo/Eval/blob/main/LICENSE) file for details.
247+
248+
## 📞 Support
249+
250+
- **Issues**: [GitHub Issues](https://github.com/NVIDIA-NeMo/Eval/issues)
251+
- **Discussions**: [GitHub Discussions](https://github.com/NVIDIA-NeMo/Eval/discussions)
252+
- **Documentation**: [NeMo Documentation](https://nemo-framework-documentation.gitlab-master-pages.nvidia.com/eval-build/)
253+
254+
## 🔗 Related Projects
255+
256+
- [NeMo Export Deploy](https://github.com/NVIDIA-NeMo/Export-Deploy) - Model export and deployment
257+
258+
---
259+
260+
**Note**: This project is actively maintained by NVIDIA. For the latest updates and features, please check our [releases page](https://github.com/NVIDIA-NeMo/Eval/releases).

0 commit comments

Comments
 (0)