Skip to content

Commit 9f18e2c

Browse files
ynankanikevalmorabia97
authored andcommitted
[4975376][5541172]perplexity and kl-divergence benchmark metrics (#411)
Signed-off-by: unknown <[email protected]>
1 parent 1d6729b commit 9f18e2c

File tree

8 files changed

+2257
-1
lines changed

8 files changed

+2257
-1
lines changed
Lines changed: 154 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,154 @@
1+
# KL Divergence Model Validation Toolkit
2+
3+
This toolkit provides comprehensive model validation capabilities using KL divergence metrics to compare two models. It's designed to evaluate the similarity between model outputs across different optimization techniques, frameworks, and hardware backends.
4+
5+
## Overview
6+
7+
The toolkit measures output similarity between models using KL (Kullback-Leibler) divergence, which quantifies how one probability distribution differs from another. Lower KL divergence values indicate more similar model outputs.
8+
9+
**Primary Use Cases:**
10+
11+
1. **Model Optimization Validation** - Verify that optimized models (quantization, pruning) maintain output quality
12+
2. **Framework Comparison** - Compare Hugging Face models vs ONNX Runtime GenAI models
13+
3. **Precision Analysis** - Evaluate FP16 vs INT4 vs INT8 model outputs
14+
4. **Execution Provider Testing** - Test different EP implementations (CUDA, DirectML, CPU, TensorRT)
15+
16+
## Key Components
17+
18+
### Main Script
19+
20+
| Script | Purpose | Comparison Modes |
21+
|--------|---------|------------------|
22+
| `compute_kl_divergence.py` | **Two-model sequential comparison** | • HF vs GenAI<br>• GenAI vs GenAI (same EP)<br>• GenAI vs HF<br>• HF vs HF |
23+
24+
### Datasets Used
25+
26+
- **Wikitext-2** test split for consistent evaluation across all models
27+
- Automatic dataset loading and preprocessing via HuggingFace datasets
28+
29+
## Installation
30+
31+
### 1. Install Base Requirements
32+
33+
```bash
34+
pip install -r requirements.txt
35+
```
36+
37+
Note: Install torch with CUDA for faster inference:
38+
"pip install torch torchvision torchaudio --index-url <https://download.pytorch.org/whl/cu129>"
39+
40+
### 2. Install ONNX Runtime GenAI Package
41+
42+
Install **one** of the following based on your hardware:
43+
44+
```bash
45+
# For CUDA
46+
pip install onnxruntime-genai-cuda
47+
48+
# For DirectML support
49+
pip install onnxruntime-genai-directml
50+
51+
# For CPU
52+
pip install onnxruntime-genai
53+
```
54+
55+
## Usage Examples
56+
57+
### Quick Start
58+
59+
#### Compare HF vs GenAI Model
60+
61+
```bash
62+
python compute_kl_divergence.py \
63+
--model1 "meta-llama/Llama-3.1-8B-Instruct" --model1_type hf \
64+
--model2 "G:\models\genai_model" --model2_type genai \
65+
--device cuda \
66+
--output results.json
67+
```
68+
69+
#### Compare Two GenAI Models (Same EP)
70+
71+
```bash
72+
python compute_kl_divergence.py \
73+
--model1 "G:\models\genai_fp16" --model1_type genai \
74+
--model2 "G:\models\genai_int4" --model2_type genai \
75+
--output fp16_vs_int4.json
76+
```
77+
78+
### Advanced Options
79+
80+
#### Enable Debug Output
81+
82+
```bash
83+
python compute_kl_divergence.py \
84+
--model1 "meta-llama/Llama-3.1-8B-Instruct" --model1_type hf \
85+
--model2 "G:\models\genai_model" --model2_type genai \
86+
--device cuda \
87+
--output results.json \
88+
--debug # Enables verbose logging
89+
```
90+
91+
## Configuration Parameters
92+
93+
### compute_kl_divergence.py
94+
95+
**Required Parameters:**
96+
97+
| Parameter | Description | Values |
98+
|-----------|-------------|--------|
99+
| `--model1` | Path to first model | Local path or HF Hub identifier |
100+
| `--model1_type` | Type of first model | `hf`, `genai` |
101+
| `--model2` | Path to second model | Local path or HF Hub identifier |
102+
| `--model2_type` | Type of second model | `hf`, `genai` |
103+
104+
**Optional Parameters:**
105+
106+
| Parameter | Description | Default |
107+
|-----------|-------------|---------|
108+
| `--device` | Device for HF model inference | `cuda` |
109+
| `--output` | Output JSON file path | None (prints to console) |
110+
| `--debug` | Enable verbose debug output | False |
111+
112+
**Model Path Formats:**
113+
114+
- **HF models**:
115+
- Hub identifier: `meta-llama/Llama-3.1-8B-Instruct`
116+
- Local path: `F:\shared\Llama-3.1-8B-Instruct`
117+
- **GenAI models**:
118+
- Local path only: `G:\models\genai_model`
119+
120+
### Key Insights
121+
122+
- **Lower is better**: Smaller KL divergence = more similar outputs
123+
- **Relative comparison**: Compare against baseline (e.g., HF FP32)
124+
125+
## Troubleshooting
126+
127+
### Common Issues and Solutions
128+
129+
#### 1. CUDA Out of Memory
130+
131+
**Error:**
132+
133+
```text
134+
RuntimeError: CUDA out of memory
135+
```
136+
137+
**Solutions:**
138+
139+
- Use CPU for HF model: `--device cpu`
140+
- Close other applications using GPU
141+
- Try smaller batch size (modify code if needed)
142+
- Ensure only one model loads at a time (script should handle this)
143+
144+
#### 2. Execution Provider Mismatch
145+
146+
**Error:**
147+
148+
```text
149+
[INFO] Comparing two GenAI models (same execution provider)
150+
```
151+
152+
**Note:** This is informational. GenAI vs GenAI comparisons require same EP.
153+
154+
**Solution:** Ensure both models were created for the same execution provider.

0 commit comments

Comments
 (0)