Skip to content

Commit fe4ea31

Browse files
committed
Add preset dataset unit tests and documentation
- Add test_dataset_presets.py with 20 test cases for 6 presets across 5 datasets - Add comprehensive testing guide and schema reference documentation Tests verify that transforms work correctly without end-to-end runs, enabling fast regression detection when transform code changes. Signed-off-by: attafosu <thomas.atta-fosu@intel.com>
1 parent 41e8023 commit fe4ea31

File tree

4 files changed

+1250
-0
lines changed

4 files changed

+1250
-0
lines changed

DATASET_PRESET_TESTING.md

Lines changed: 311 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,311 @@
1+
# Dataset Preset Testing Documentation
2+
3+
## Overview
4+
5+
This guide explains the unit testing solution for preset datasets in the MLPerf Inference Endpoint system. The tests verify that dataset transforms work correctly without requiring end-to-end benchmark runs or external compute resources.
6+
7+
## What Was Added
8+
9+
### 1. **Test File: `tests/unit/dataset_manager/test_dataset_presets.py`**
10+
11+
Comprehensive unit tests covering all dataset presets:
12+
13+
- **CNNDailyMail**: Tests for `llama3_8b` and `llama3_8b_sglang` presets
14+
- **AIME25**: Tests for `gptoss` preset
15+
- **GPQA**: Tests for `gptoss` preset
16+
- **LiveCodeBench**: Tests for `gptoss` preset
17+
- **OpenOrca**: Tests for `llama2_70b` preset
18+
19+
Each preset gets three types of tests:
20+
1. **Instantiation test** - Verifies the preset can be created
21+
2. **Transform application test** - Verifies transforms apply without errors
22+
3. **Output validation test** - Verifies transforms produce expected output format
23+
24+
### 2. **Examples Module: `src/inference_endpoint/dataset_manager/examples.py`**
25+
26+
Practical examples showing how to:
27+
- Load predefined datasets with presets
28+
- Create custom datasets from Python data structures
29+
- Apply transforms programmatically
30+
- Test transforms without YAML configuration
31+
- Validate all presets in batch
32+
- Structure benchmarks without YAML
33+
34+
## Running the Tests
35+
36+
### Run all preset tests:
37+
```bash
38+
pytest tests/unit/dataset_manager/test_dataset_presets.py -v
39+
```
40+
41+
### Run tests for a specific dataset:
42+
```bash
43+
pytest tests/unit/dataset_manager/test_dataset_presets.py::TestCNNDailyMailPresets -v
44+
```
45+
46+
### Run a specific test:
47+
```bash
48+
pytest tests/unit/dataset_manager/test_dataset_presets.py::TestCNNDailyMailPresets::test_llama3_8b_transforms_apply -v
49+
```
50+
51+
### Run with coverage:
52+
```bash
53+
pytest tests/unit/dataset_manager/test_dataset_presets.py --cov=src/inference_endpoint/dataset_manager --cov-report=html
54+
```
55+
56+
## Test Structure
57+
58+
Each test class uses pytest fixtures to provide minimal sample data:
59+
60+
```python
61+
@pytest.fixture
62+
def sample_cnn_data(self):
63+
"""Create minimal sample data matching CNN/DailyMail schema."""
64+
return pd.DataFrame({
65+
"article": ["..."],
66+
"highlights": ["..."],
67+
})
68+
```
69+
70+
This approach:
71+
- ✅ No external API calls or dataset downloads
72+
- ✅ Tests run in <1 second (no network I/O)
73+
- ✅ Minimal memory footprint
74+
- ✅ Tests can run in CI/CD pipelines
75+
- ✅ Simple to extend with new datasets
76+
77+
## Programmatic Dataset Usage (No YAML)
78+
79+
The examples module shows how to use datasets without YAML configuration:
80+
81+
### Quick validation of all presets:
82+
```python
83+
from inference_endpoint.dataset_manager.examples import example_validate_all_presets
84+
85+
example_validate_all_presets()
86+
```
87+
88+
### Load a dataset with preset programmatically:
89+
```python
90+
from inference_endpoint.dataset_manager.predefined.cnndailymail import CNNDailyMail
91+
92+
# Get transforms
93+
transforms = CNNDailyMail.PRESETS.llama3_8b_sglang()
94+
95+
# Load dataset
96+
dataset = CNNDailyMail.get_dataloader(transforms=transforms)
97+
98+
# Use in benchmark
99+
sample = dataset.load_sample(0)
100+
```
101+
102+
### Create and test custom dataset:
103+
```python
104+
from inference_endpoint.dataset_manager.dataset import Dataset
105+
from inference_endpoint.dataset_manager.transforms import apply_transforms
106+
import pandas as pd
107+
108+
# Create sample data
109+
data = pd.DataFrame({
110+
"question": ["What is AI?"],
111+
"answer": ["Artificial Intelligence"]
112+
})
113+
114+
# Get preset transforms
115+
from inference_endpoint.dataset_manager.predefined.aime25 import AIME25
116+
transforms = AIME25.PRESETS.gptoss()
117+
118+
# Apply transforms
119+
result = apply_transforms(data, transforms)
120+
121+
# Verify
122+
assert "prompt" in result.columns
123+
assert len(result) == 1
124+
```
125+
126+
## How Transform Tests Work
127+
128+
### Test Categories
129+
130+
1. **Instantiation Tests**
131+
- Verify preset functions can be called without errors
132+
- Ensure transforms are returned as a list
133+
- Quick smoke tests
134+
135+
2. **Application Tests**
136+
- Apply transforms to sample data
137+
- Verify output DataFrame has correct shape
138+
- Check that required output columns are created
139+
140+
3. **Validation Tests**
141+
- Verify transform output meets expected format
142+
- Check that data from source columns is properly embedded
143+
- Validate format-specific requirements (e.g., code delimiters, multiple choice format)
144+
145+
### Example Test Pattern
146+
147+
```python
148+
def test_preset_name_transforms_apply(self, sample_data):
149+
"""Test that transforms apply without errors."""
150+
# 1. Get the preset
151+
transforms = DatasetClass.PRESETS.preset_name()
152+
153+
# 2. Apply to sample data
154+
result = apply_transforms(sample_data, transforms)
155+
156+
# 3. Verify output
157+
assert result is not None
158+
assert len(result) == len(sample_data)
159+
assert "prompt" in result.columns # or other expected column
160+
```
161+
162+
## Extending the Tests
163+
164+
### Add a new dataset preset test:
165+
166+
1. **Create the test class** in `test_dataset_presets.py`:
167+
```python
168+
class TestNewDatasetPresets:
169+
"""Test NewDataset presets."""
170+
171+
@pytest.fixture
172+
def sample_data(self):
173+
"""Create sample data matching schema."""
174+
return pd.DataFrame({
175+
"column1": [...],
176+
"column2": [...],
177+
})
178+
179+
def test_preset_name_instantiation(self):
180+
"""Test preset can be instantiated."""
181+
transforms = NewDataset.PRESETS.preset_name()
182+
assert transforms is not None
183+
184+
def test_preset_name_transforms_apply(self, sample_data):
185+
"""Test transforms apply without errors."""
186+
transforms = NewDataset.PRESETS.preset_name()
187+
result = apply_transforms(sample_data, transforms)
188+
assert "prompt" in result.columns
189+
```
190+
191+
2. **Import the dataset class** at the top:
192+
```python
193+
from inference_endpoint.dataset_manager.predefined.new_dataset import NewDataset
194+
```
195+
196+
### Test when transforms change:
197+
198+
Since tests apply actual transforms to sample data, any change to a preset's transforms will automatically be caught:
199+
200+
```bash
201+
# Run tests before making changes to preset
202+
pytest tests/unit/dataset_manager/test_dataset_presets.py -v
203+
204+
# Modify src/inference_endpoint/dataset_manager/predefined/cnndailymail/presets.py
205+
# Tests will catch any breaking changes:
206+
pytest tests/unit/dataset_manager/test_dataset_presets.py::TestCNNDailyMailPresets -v
207+
```
208+
209+
## What These Tests Don't Cover
210+
211+
These are **unit tests** for transforms, not end-to-end benchmark tests:
212+
213+
- ❌ Network latency or throughput metrics
214+
- ❌ Model inference accuracy
215+
- ❌ Full dataset loading (only sample rows)
216+
- ❌ API endpoint responses
217+
- ❌ External service dependencies
218+
219+
These require separate integration tests or actual benchmark runs.
220+
221+
## Integration with CI/CD
222+
223+
Add to your CI pipeline:
224+
225+
```yaml
226+
# Example GitHub Actions or similar
227+
- name: Test Dataset Presets
228+
run: |
229+
pytest tests/unit/dataset_manager/test_dataset_presets.py \
230+
-v \
231+
--cov=src/inference_endpoint/dataset_manager \
232+
--cov-report=json
233+
```
234+
235+
## Key Benefits
236+
237+
✅ **Fast** - Tests run in <5 seconds with no external dependencies
238+
✅ **Reliable** - No flakiness from network calls or dataset availability
239+
✅ **Maintainable** - Clear test structure, easy to extend
240+
✅ **Coverage** - Catches transform regressions automatically
241+
✅ **No resources** - Works with no GPU/compute, only CPU
242+
✅ **Development friendly** - Run locally before committing
243+
244+
## Example Usage Scenarios
245+
246+
### Scenario 1: Verify transform changes don't break presets
247+
```bash
248+
# After modifying transforms.py:
249+
pytest tests/unit/dataset_manager/test_dataset_presets.py -v
250+
```
251+
252+
### Scenario 2: Test new preset implementation
253+
```python
254+
# In your preset function:
255+
def new_preset() -> list[Transform]:
256+
return [Transform1(), Transform2()]
257+
258+
# Add unit test:
259+
def test_new_preset_transforms_apply(self, sample_data):
260+
transforms = DatasetClass.PRESETS.new_preset()
261+
result = apply_transforms(sample_data, transforms)
262+
assert "expected_column" in result.columns
263+
```
264+
265+
### Scenario 3: Validate dataset before full benchmark run
266+
```python
267+
from inference_endpoint.dataset_manager.examples import example_test_preset_transforms
268+
269+
# Quick validation without running full benchmark
270+
example_test_preset_transforms()
271+
```
272+
273+
## Troubleshooting
274+
275+
### Test import errors
276+
```bash
277+
# Ensure endpoints-repo is in PYTHONPATH
278+
export PYTHONPATH=/home/sdp/tattafos/endpoints-repo/src:$PYTHONPATH
279+
pytest tests/unit/dataset_manager/test_dataset_presets.py
280+
```
281+
282+
### Missing dataset dependencies
283+
Some presets may require optional tokenizers (e.g., Harmonize transform requires transformers).
284+
Run with:
285+
```bash
286+
pytest tests/unit/dataset_manager/test_dataset_presets.py -m "not slow" -v
287+
```
288+
289+
### Debugging a specific test
290+
```bash
291+
pytest tests/unit/dataset_manager/test_dataset_presets.py::TestClass::test_method -vvs
292+
```
293+
294+
## Next Steps
295+
296+
1. **Run the tests** to verify your current setup:
297+
```bash
298+
pytest tests/unit/dataset_manager/test_dataset_presets.py -v
299+
```
300+
301+
2. **Review the examples** to understand programmatic dataset usage:
302+
```bash
303+
python -m inference_endpoint.dataset_manager.examples
304+
```
305+
306+
3. **Add to pre-commit** to catch regressions automatically:
307+
```bash
308+
pre-commit run pytest
309+
```
310+
311+
4. **Extend tests** when adding new dataset presets or transforms

0 commit comments

Comments
 (0)