|
| 1 | +--- |
| 2 | +title: "Test Set Versioning and New Test Set UI" |
| 3 | +slug: testset-versioning |
| 4 | +date: 2026-01-20 |
| 5 | +tags: [v0.74.0] |
| 6 | +description: "Track test set changes with versioning and link evaluations to specific versions. Plus a completely rebuilt test set UI that scales to hundreds of thousands of rows." |
| 7 | +--- |
| 8 | + |
| 9 | +# Test Set Versioning and New Test Set UI |
| 10 | + |
| 11 | +## Overview |
| 12 | + |
| 13 | +When you compare evaluation results from last week to today, how do you know the test data didn't change? You don't. Until now. |
| 14 | + |
| 15 | +Test set versioning tracks every change to your test sets. Each edit, upload, or programmatic update creates a new version. Evaluations link to specific versions, so you can trust your comparisons. |
| 16 | + |
| 17 | +We also rebuilt the test set UI from scratch. It handles hundreds of thousands of rows without slowing down. Editing is faster, especially for chat messages and complex JSON data. |
| 18 | + |
| 19 | +<div style={{display: 'flex', justifyContent: 'center', marginTop: "20px", marginBottom: "20px", flexDirection: 'column', alignItems: 'center'}}> |
| 20 | + <iframe |
| 21 | + width="100%" |
| 22 | + height="500" |
| 23 | + src="https://www.youtube.com/embed/hh1OHhzak6Q" |
| 24 | + title="Test Set Versioning Demo" |
| 25 | + frameBorder="0" |
| 26 | + allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" |
| 27 | + allowFullScreen |
| 28 | + ></iframe> |
| 29 | +</div> |
| 30 | + |
| 31 | +## Test Set Versioning |
| 32 | + |
| 33 | +Every change to a test set creates a new version. You can see the version history, compare versions, and revert to previous versions. |
| 34 | + |
| 35 | +**What gets versioned:** |
| 36 | +- Adding, editing, or deleting test cases |
| 37 | +- Uploading new data (CSV, JSON) |
| 38 | +- Programmatic updates via SDK or API |
| 39 | +- Column changes |
| 40 | + |
| 41 | +**Evaluation linking:** |
| 42 | +When you run an evaluation, it links to the specific test set version used. This means: |
| 43 | +- You can compare evaluations knowing they used the same test data |
| 44 | +- If someone updates the test set, your historical evaluations still reference the original version |
| 45 | +- You can filter evaluations by test set version |
| 46 | + |
| 47 | +**Programmatic versioning:** |
| 48 | +Upload test sets via the SDK or API. The system detects changes and creates new versions automatically. |
| 49 | + |
| 50 | +```python |
| 51 | +import agenta as ag |
| 52 | + |
| 53 | +# Upload a test set - creates a new version if content changed |
| 54 | +testset = ag.testsets.upload( |
| 55 | + name="my-test-set", |
| 56 | + data=test_cases, # Your test case data |
| 57 | +) |
| 58 | + |
| 59 | +# The testset object includes version information |
| 60 | +print(f"Version: {testset.version}") |
| 61 | +``` |
| 62 | + |
| 63 | +## New Test Set UI |
| 64 | + |
| 65 | +The test set view is completely rebuilt. It uses virtualized rendering, so it stays fast with large datasets. |
| 66 | + |
| 67 | +**What's new:** |
| 68 | +- **Scale**: Handle 100,000+ rows without performance issues |
| 69 | +- **JSON support**: View and edit complex JSON directly. Toggle between raw JSON and formatted views |
| 70 | +- **String or JSON columns**: Choose how each column stores data. Use JSON for structured data like chat messages |
| 71 | + |
| 72 | +**Chat message editing:** |
| 73 | +Test cases with chat messages (like `[{"role": "user", "content": "..."}]`) now have a dedicated editor. Add, remove, or reorder messages. Edit content with proper formatting. |
| 74 | + |
| 75 | +**Upload options:** |
| 76 | +- Upload CSV or JSON files |
| 77 | +- Create test sets in the UI |
| 78 | +- Create programmatically via SDK |
| 79 | +- Add spans from observability to test sets |
| 80 | + |
| 81 | +## Traceability |
| 82 | + |
| 83 | +Everything connects. When you view a trace in observability: |
| 84 | +- See which test case it came from |
| 85 | +- See which test set version |
| 86 | +- Filter traces by test case or test set |
| 87 | + |
| 88 | +When you view an evaluation: |
| 89 | +- See the exact test set version used |
| 90 | +- Compare only evaluations that used the same version |
| 91 | +- Navigate to the test set to see the data |
| 92 | + |
| 93 | +## Getting Started |
| 94 | + |
| 95 | +Test set versioning is automatic. Any change creates a new version. |
| 96 | + |
| 97 | +To use versioned test sets in evaluations: |
| 98 | +1. Create or upload a test set |
| 99 | +2. Make your edits (each save creates a version) |
| 100 | +3. Run an evaluation (it links to the current version) |
| 101 | +4. Later, compare evaluations knowing they used the same test data |
| 102 | + |
| 103 | +For programmatic access, check the [test sets documentation](/evaluation/evaluation-from-sdk/managing-testsets). |
0 commit comments