Releases: etalab-ia/evalap
Releases Β· etalab-ia/evalap
v0.4
[0.4] - 2025-05-21
π Features
- Support thinking model in judge
- Add nb_tool_call as ops metrics + add MCP_BRIDGE_URL + format
- Parquet dataset support + ocr metrics + notebook demo
- Add and handle new with_vision and prelude_prompt attribute
- Calculation of the environmental impact of models for the response generation part.
- Creation of two new environmental metrics: energy_consumption and gwp_consumption.
π§ Improvements
- [UI] display of the environmental brick in the OPS pane and experiments_set metric results.
v0.3.1
π Features
- [API] Support Anthropic, Openai, Mistral and Albert providers for judge models
judge_modelparameter in experiments (models are fetch from the openai api v1/models endpoints) - [SCRIPTS] ADD convenient scripts to run experiment from an isolated environment (e.g. like cortex, see the tutorial )
π§ Improvements
- [UI] Add a special card for orphan experiments at the bottom of the experiments list.
- [UI] Order the experiment set from the newest first
- [UI] Remove old confusing experiments menu in favor of only the experiment sets menu (renamed simply experiments)
v0.3
π Features
- Integrated MCP support and multi-step LLM generations with MCP client bridge.
- Added experiment set with cross-validation parameters and demo notebooks.
- Integrated multiple RAG metrics for deep evaluation.
- Supported delete experiment route for admin users.
- Introduced new retry and post routes with UI improvements.
- Added experiments 'finished' and 'failure' ratio in overview.
- New tests for an increase code coverage and addressed pydantic warnings.
- Implemented loop limit and tool call step saving.
- Improved sorting and metrics highlighting in the experiment set score table.
π Bug Fixes
- Enhanced error handling for missing metric input and baseline demo notebook.
- Removed unnecessary attributes and improved schema validation.
- Fixed various UI bugs and improved experiment view.
- Improved notebook variable names and used public endpoints.
- Enhanced GitHub Actions CI and addressed Alembic issues.
- Corrected schema serialization and computation needs.
- Improved experiment status updates and endpoint terminology.
- Handled unknown model cases and improved dataset visibility.
- Fixed various typos and improved model sorting and ops board status.
- Improved schema validation and error detail return for API.
- Addressed issues with experiment view and retry functionality.
π οΈ Code Improvements
- Reorganized code structure (pip ready) and fixed import issues.
- Moved API components to clients and adjusted imports accordingly.
π₯ Hotfixes
- Addressed dataset and SQL float compatibility issues.
- Updated configuration files for supervisord and Alembic.
βοΈ Operations
- Added Docker and Streamlit configuration files.
- Fix supervisord path to deploy.