@@ -6,6 +6,33 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
66and this project adheres to [ Semantic Versioning] ( https://semver.org/spec/v2.0.0.html ) .
77
88
9+ ## [ 1.0.0] - 2026-02-27
10+
11+ ### Python
12+
13+ #### Added
14+ - ** ` evaluate() ` unified API** — single entrypoint for local, cloud, and LLM-as-Judge evaluations with automatic engine routing
15+ - ** Multimodal LLM Judge** — pass ` image_url ` , ` audio_url ` , ` input_image_url ` , ` output_image_url ` for vision/audio evaluation with Gemini, GPT-4o, etc.
16+ - ** Auto-generate grading criteria** — ` generate_prompt=True ` converts a short description into a detailed rubric via LLM
17+ - ** LLM augmentation** — ` augment=True ` runs local heuristic first, then LLM refines the score (faithfulness, hallucination_score, task_completion, etc.)
18+ - ** Feedback loop system** — submit corrections, retrieve as few-shot examples via ChromaDB, calibrate pass/fail thresholds
19+ - ** 72+ local metrics** — string checks, JSON validation, similarity, NLI-based hallucination detection, RAG evaluation, function calling, agent trajectory, structured output, security guardrails
20+ - ** OpenTelemetry integration** — ` enable_auto_enrichment() ` emits ` gen_ai.evaluation.* ` spans for Jaeger/Datadog/Grafana
21+ - ** Streaming evaluation** — token-by-token monitoring with configurable early stopping
22+ - ** 9 cookbooks** — local metrics, LLM judge, RAG evaluation, guardrails, streaming, autoeval, OTEL tracing, feedback loop, multimodal judge
23+
24+ #### Changed
25+ - ** Poetry to uv** — migrated build system for 10x faster dependency resolution
26+ - Widened LLM provider type signatures from ` Dict[str, str] ` to ` Dict[str, Any] ` for multimodal content parts
27+
28+ #### Fixed
29+ - 6 code security scanner bugs (Phase 2)
30+ - Guardrails ensemble scoring and scanner edge cases
31+ - NLI consolidation and empty-input handling in RAG metrics
32+ - K8s backend JSON log parsing
33+ - Temporal Docker healthcheck and DB config
34+ - Celery serialization for closures
35+
936## [ 0.2.2] - 2025-10-27
1037
1138- Introducing LLM As A Judge
@@ -68,7 +95,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
6895
6996---
7097
71- [ Unreleased ] : https://github.com/future-agi/ai-evaluation/compare/v0.2.2...HEAD
98+ [ Unreleased ] : https://github.com/future-agi/ai-evaluation/compare/v1.0.0...HEAD
99+ [ 1.0.0 ] : https://github.com/future-agi/ai-evaluation/compare/v0.2.2...v1.0.0
72100[ 0.2.2 ] : https://github.com/future-agi/ai-evaluation/compare/v0.2.1...v0.2.2
73101[ 0.2.1 ] : https://github.com/future-agi/ai-evaluation/compare/v0.1.0...v0.2.1
74102[ 0.1.0 ] : https://github.com/future-agi/ai-evaluation/releases/tag/v0.1.0
0 commit comments