|
| 1 | +# Changelog |
| 2 | + |
| 3 | +All notable changes to this project will be documented in this file. |
| 4 | + |
| 5 | +The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/), |
| 6 | +and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). |
| 7 | + |
| 8 | + |
| 9 | +## [0.2.2] - 2025-10-27 |
| 10 | + |
| 11 | +- Introducing LLM As A Judge |
| 12 | +- Heuristics for JSON, Similarity, String, and Aggregation metrics |
| 13 | +- Minor Bug Fixes |
| 14 | + |
| 15 | +## [0.2.1] - 2025-10-9 |
| 16 | + |
| 17 | +### Python |
| 18 | +#### Added |
| 19 | +- Support for batch evaluation |
| 20 | +- New evaluation templates for bias detection |
| 21 | +- Improved error handling and logging |
| 22 | + |
| 23 | +#### Fixed |
| 24 | +- Issue with context adherence evaluation |
| 25 | +- Memory leak in long-running evaluations |
| 26 | + |
| 27 | +## [0.1.5] - 2025-10-01 |
| 28 | + |
| 29 | +### TypeScript |
| 30 | +#### Added |
| 31 | +- Initial TypeScript SDK release |
| 32 | +- Core evaluation functionality |
| 33 | +- Support for all evaluation templates |
| 34 | +- ESM and CommonJS module support |
| 35 | + |
| 36 | +### Python |
| 37 | +#### Added |
| 38 | +- Initial Python SDK release |
| 39 | +- 50+ evaluation templates across multiple categories |
| 40 | +- Support for RAG, Safety, Function Calling, and Summarization evaluations |
| 41 | +- Integration with Future AGI platform |
| 42 | +- Batch evaluation support |
| 43 | + |
| 44 | +#### Features |
| 45 | +- **RAG Evaluations**: groundedness, context_adherence, answer_relevance |
| 46 | +- **Safety**: content_moderation, prompt_injection, harmful_advice detection |
| 47 | +- **Function Calling**: JSON validation, schema validation |
| 48 | +- **Summarization**: quality assessment, factual consistency |
| 49 | +- **Behavioral**: tone analysis, helpfulness, politeness |
| 50 | +- **Metrics**: ROUGE, embedding similarity, fuzzy matching |
| 51 | + |
| 52 | +--- |
| 53 | + |
| 54 | +## Release Notes Format |
| 55 | + |
| 56 | +### Types of Changes |
| 57 | +- `Added` for new features |
| 58 | +- `Changed` for changes in existing functionality |
| 59 | +- `Deprecated` for soon-to-be removed features |
| 60 | +- `Removed` for now removed features |
| 61 | +- `Fixed` for any bug fixes |
| 62 | +- `Security` in case of vulnerabilities |
| 63 | + |
| 64 | +### Versioning |
| 65 | +- **Major version (X.0.0)**: Breaking changes |
| 66 | +- **Minor version (0.X.0)**: New features, backward compatible |
| 67 | +- **Patch version (0.0.X)**: Bug fixes, backward compatible |
| 68 | + |
| 69 | +--- |
| 70 | + |
| 71 | +[Unreleased]: https://github.com/future-agi/ai-evaluation/compare/v0.2.2...HEAD |
| 72 | +[0.2.2]: https://github.com/future-agi/ai-evaluation/compare/v0.2.1...v0.2.2 |
| 73 | +[0.2.1]: https://github.com/future-agi/ai-evaluation/compare/v0.1.0...v0.2.1 |
| 74 | +[0.1.0]: https://github.com/future-agi/ai-evaluation/releases/tag/v0.1.0 |
0 commit comments