Skip to content

Commit a33d778

Browse files
committed
Cleanup
Signed-off-by: attafosu <thomas.atta-fosu@intel.com>
1 parent 0c4f2a3 commit a33d778

File tree

1 file changed

+57
-249
lines changed

1 file changed

+57
-249
lines changed

DATASET_PRESET_TESTING.md

Lines changed: 57 additions & 249 deletions
Original file line numberDiff line numberDiff line change
@@ -1,287 +1,95 @@
1-
# Dataset Preset Testing Documentation
1+
# Dataset Preset Testing
22

3-
## Overview
3+
Unit tests for dataset preset transforms. These tests verify that presets correctly transform dataset columns without requiring end-to-end benchmark runs.
44

5-
This guide explains the unit testing solution for preset datasets in the MLPerf Inference Endpoint system. The tests verify that dataset transforms work correctly without requiring end-to-end benchmark runs or external compute resources.
5+
## Quick Start
66

7-
## What Was Added
8-
9-
### 1. **Test File: `tests/unit/dataset_manager/test_dataset_presets.py`**
10-
11-
Comprehensive unit tests covering all dataset presets:
12-
13-
- **CNNDailyMail**: Tests for `llama3_8b` and `llama3_8b_sglang` presets
14-
- **AIME25**: Tests for `gptoss` preset
15-
- **GPQA**: Tests for `gptoss` preset
16-
- **LiveCodeBench**: Tests for `gptoss` preset
17-
- **OpenOrca**: Tests for `llama2_70b` preset
18-
19-
Each preset gets three types of tests:
20-
1. **Instantiation test** - Verifies the preset can be created
21-
2. **Transform application test** - Verifies transforms apply without errors
22-
3. **Output validation test** - Verifies transforms produce expected output format
23-
24-
## Running the Tests
25-
26-
### Run all preset tests:
277
```bash
8+
# Run all preset tests
289
pytest tests/unit/dataset_manager/test_dataset_presets.py -v
29-
```
3010

31-
### Run tests for a specific dataset:
32-
```bash
11+
# Run tests for a specific dataset
3312
pytest tests/unit/dataset_manager/test_dataset_presets.py::TestCNNDailyMailPresets -v
34-
```
3513

36-
### Run a specific test:
37-
```bash
38-
pytest tests/unit/dataset_manager/test_dataset_presets.py::TestCNNDailyMailPresets::test_llama3_8b_transforms_apply -v
39-
```
40-
41-
### Run with coverage:
42-
```bash
43-
pytest tests/unit/dataset_manager/test_dataset_presets.py --cov=src/inference_endpoint/dataset_manager --cov-report=html
44-
```
45-
46-
## Test Structure
47-
48-
Each test class uses pytest fixtures to provide minimal sample data:
49-
50-
```python
51-
@pytest.fixture
52-
def sample_cnn_data(self):
53-
"""Create minimal sample data matching CNN/DailyMail schema."""
54-
return pd.DataFrame({
55-
"article": ["..."],
56-
"highlights": ["..."],
57-
})
14+
# Exclude slow tests (Harmonize transform requires transformers)
15+
pytest tests/unit/dataset_manager/test_dataset_presets.py -m "not slow" -v
5816
```
5917

60-
This approach:
61-
- ✅ No external API calls or dataset downloads
62-
- ✅ Tests run in <1 second (no network I/O)
63-
- ✅ Minimal memory footprint
64-
- ✅ Tests can run in CI/CD pipelines
65-
- ✅ Simple to extend with new datasets
66-
67-
## Programmatic Dataset Usage (No YAML)
68-
69-
The schema reference documents how to use datasets without YAML configuration. See the `DATASET_SCHEMA_REFERENCE.md` for input/output column specifications.
18+
## Preset Coverage
7019

71-
### Load a dataset with preset programmatically:
72-
```python
73-
from inference_endpoint.dataset_manager.predefined.cnndailymail import CNNDailyMail
74-
75-
# Get transforms
76-
transforms = CNNDailyMail.PRESETS.llama3_8b_sglang()
20+
| Dataset | Presets | Tests |
21+
|---------|---------|-------|
22+
| CNNDailyMail | `llama3_8b`, `llama3_8b_sglang` | 6 |
23+
| AIME25 | `gptoss` | 3 |
24+
| GPQA | `gptoss` | 3 |
25+
| LiveCodeBench | `gptoss` | 3 |
26+
| OpenOrca | `llama2_70b` | 3 |
7727

78-
# Load dataset
79-
dataset = CNNDailyMail.get_dataloader(transforms=transforms)
28+
## Adding Tests for New Presets
8029

81-
# Use in benchmark
82-
sample = dataset.load_sample(0)
83-
```
30+
When adding a new dataset preset, add a test class to `tests/unit/dataset_manager/test_dataset_presets.py`:
8431

85-
### Create and test custom dataset:
8632
```python
87-
from inference_endpoint.dataset_manager.dataset import Dataset
88-
from inference_endpoint.dataset_manager.transforms import apply_transforms
8933
import pandas as pd
34+
import pytest
35+
from inference_endpoint.dataset_manager.transforms import apply_transforms
36+
from inference_endpoint.dataset_manager.predefined.my_dataset import MyDataset
9037

91-
# Create sample data
92-
data = pd.DataFrame({
93-
"question": ["What is AI?"],
94-
"answer": ["Artificial Intelligence"]
95-
})
96-
97-
# Get preset transforms
98-
from inference_endpoint.dataset_manager.predefined.aime25 import AIME25
99-
transforms = AIME25.PRESETS.gptoss()
100-
101-
# Apply transforms
102-
result = apply_transforms(data, transforms)
103-
104-
# Verify
105-
assert "prompt" in result.columns
106-
assert len(result) == 1
107-
```
108-
109-
## How Transform Tests Work
110-
111-
### Test Categories
112-
113-
1. **Instantiation Tests**
114-
- Verify preset functions can be called without errors
115-
- Ensure transforms are returned as a list
116-
- Quick smoke tests
117-
118-
2. **Application Tests**
119-
- Apply transforms to sample data
120-
- Verify output DataFrame has correct shape
121-
- Check that required output columns are created
122-
123-
3. **Validation Tests**
124-
- Verify transform output meets expected format
125-
- Check that data from source columns is properly embedded
126-
- Validate format-specific requirements (e.g., code delimiters, multiple choice format)
127-
128-
### Example Test Pattern
129-
130-
```python
131-
def test_preset_name_transforms_apply(self, sample_data):
132-
"""Test that transforms apply without errors."""
133-
# 1. Get the preset
134-
transforms = DatasetClass.PRESETS.preset_name()
135-
136-
# 2. Apply to sample data
137-
result = apply_transforms(sample_data, transforms)
138-
139-
# 3. Verify output
140-
assert result is not None
141-
assert len(result) == len(sample_data)
142-
assert "prompt" in result.columns # or other expected column
143-
```
144-
145-
## Extending the Tests
146-
147-
### Add a new dataset preset test:
148-
149-
1. **Create the test class** in `test_dataset_presets.py`:
150-
```python
151-
class TestNewDatasetPresets:
152-
"""Test NewDataset presets."""
15338

39+
class TestMyDatasetPresets:
15440
@pytest.fixture
15541
def sample_data(self):
156-
"""Create sample data matching schema."""
42+
"""Minimal sample data matching dataset schema."""
15743
return pd.DataFrame({
158-
"column1": [...],
159-
"column2": [...],
44+
"input_col1": ["value1"],
45+
"input_col2": ["value2"],
16046
})
16147

162-
def test_preset_name_instantiation(self):
163-
"""Test preset can be instantiated."""
164-
transforms = NewDataset.PRESETS.preset_name()
48+
def test_my_preset_instantiation(self):
49+
"""Verify preset can be created."""
50+
transforms = MyDataset.PRESETS.my_preset()
16551
assert transforms is not None
52+
assert len(transforms) > 0
16653

167-
def test_preset_name_transforms_apply(self, sample_data):
168-
"""Test transforms apply without errors."""
169-
transforms = NewDataset.PRESETS.preset_name()
54+
def test_my_preset_transforms_apply(self, sample_data):
55+
"""Verify transforms apply without errors."""
56+
transforms = MyDataset.PRESETS.my_preset()
17057
result = apply_transforms(sample_data, transforms)
171-
assert "prompt" in result.columns
172-
```
17358

174-
2. **Import the dataset class** at the top:
175-
```python
176-
from inference_endpoint.dataset_manager.predefined.new_dataset import NewDataset
177-
```
178-
179-
### Test when transforms change:
180-
181-
Since tests apply actual transforms to sample data, any change to a preset's transforms will automatically be caught:
182-
183-
```bash
184-
# Run tests before making changes to preset
185-
pytest tests/unit/dataset_manager/test_dataset_presets.py -v
59+
assert result is not None
60+
assert len(result) == len(sample_data)
61+
assert "prompt" in result.columns # Expected output column
18662

187-
# Modify src/inference_endpoint/dataset_manager/predefined/cnndailymail/presets.py
188-
# Tests will catch any breaking changes:
189-
pytest tests/unit/dataset_manager/test_dataset_presets.py::TestCNNDailyMailPresets -v
190-
```
191-
192-
## What These Tests Don't Cover
193-
194-
These are **unit tests** for transforms, not end-to-end benchmark tests:
195-
196-
- ❌ Network latency or throughput metrics
197-
- ❌ Model inference accuracy
198-
- ❌ Full dataset loading (only sample rows)
199-
- ❌ API endpoint responses
200-
- ❌ External service dependencies
201-
202-
These require separate integration tests or actual benchmark runs.
203-
204-
## Integration with CI/CD
205-
206-
Add to your CI pipeline:
63+
def test_my_preset_output_format(self, sample_data):
64+
"""Verify output has expected format."""
65+
transforms = MyDataset.PRESETS.my_preset()
66+
result = apply_transforms(sample_data, transforms)
20767

208-
```yaml
209-
# Example GitHub Actions or similar
210-
- name: Test Dataset Presets
211-
run: |
212-
pytest tests/unit/dataset_manager/test_dataset_presets.py \
213-
-v \
214-
--cov=src/inference_endpoint/dataset_manager \
215-
--cov-report=json
68+
# Validate format-specific expectations
69+
assert len(result["prompt"][0]) > 0
21670
```
21771

218-
## Key Benefits
219-
220-
✅ **Fast** - Tests run in <5 seconds with no external dependencies
221-
✅ **Reliable** - No flakiness from network calls or dataset availability
222-
✅ **Maintainable** - Clear test structure, easy to extend
223-
✅ **Coverage** - Catches transform regressions automatically
224-
✅ **No resources** - Works with no GPU/compute, only CPU
225-
✅ **Development friendly** - Run locally before committing
72+
If the preset uses `Harmonize` transform (requires `transformers` library), mark slow tests:
22673

227-
## Example Usage Scenarios
228-
229-
### Scenario 1: Verify transform changes don't break presets
230-
```bash
231-
# After modifying transforms.py:
232-
pytest tests/unit/dataset_manager/test_dataset_presets.py -v
233-
```
234-
235-
### Scenario 2: Test new preset implementation
23674
```python
237-
# In your preset function:
238-
def new_preset() -> list[Transform]:
239-
return [Transform1(), Transform2()]
240-
241-
# Add unit test:
242-
def test_new_preset_transforms_apply(self, sample_data):
243-
transforms = DatasetClass.PRESETS.new_preset()
244-
result = apply_transforms(sample_data, transforms)
245-
assert "expected_column" in result.columns
246-
```
247-
248-
### Scenario 3: Validate dataset before full benchmark run
249-
```bash
250-
# Quick validation using pytest
251-
pytest tests/unit/dataset_manager/test_dataset_presets.py -v
252-
```
253-
254-
## Troubleshooting
255-
256-
### Test import errors
257-
```bash
258-
# Ensure src directory is in PYTHONPATH (from repo root)
259-
export PYTHONPATH=./src:$PYTHONPATH
260-
pytest tests/unit/dataset_manager/test_dataset_presets.py
261-
```
262-
263-
### Missing dataset dependencies
264-
Some presets may require optional tokenizers (e.g., Harmonize transform requires transformers).
265-
Run with:
266-
```bash
267-
pytest tests/unit/dataset_manager/test_dataset_presets.py -m "not slow" -v
268-
```
269-
270-
### Debugging a specific test
271-
```bash
272-
pytest tests/unit/dataset_manager/test_dataset_presets.py::TestClass::test_method -vvs
75+
@pytest.mark.slow
76+
def test_my_preset_transforms_apply(self, sample_data):
77+
# Test that requires transformers library
78+
pass
27379
```
27480

275-
## Next Steps
81+
## Test Scope
27682

277-
1. **Run the tests** to verify your current setup:
278-
```bash
279-
pytest tests/unit/dataset_manager/test_dataset_presets.py -v
280-
```
83+
**Tests verify:**
84+
- Preset instantiation
85+
- Transform application without errors
86+
- Required output columns exist
87+
- Data is properly transformed
28188

282-
2. **Add to pre-commit** to catch regressions automatically:
283-
```bash
284-
pre-commit run pytest
285-
```
89+
**Tests do NOT verify:**
90+
- Model inference accuracy
91+
- API endpoint compatibility
92+
- Throughput/latency metrics
93+
- Full benchmark runs
28694

287-
3. **Extend tests** when adding new dataset presets or transforms
95+
See `src/inference_endpoint/dataset_manager/README.md` for dataset schema and preset creation details.

0 commit comments

Comments
 (0)