Skip to content

Releases: etalab-ia/evalap

v0.4

22 May 14:00
5c12c86

Choose a tag to compare

[0.4] - 2025-05-21

πŸš€ Features

  • Support thinking model in judge
  • Add nb_tool_call as ops metrics + add MCP_BRIDGE_URL + format
  • Parquet dataset support + ocr metrics + notebook demo
  • Add and handle new with_vision and prelude_prompt attribute
  • Calculation of the environmental impact of models for the response generation part.
  • Creation of two new environmental metrics: energy_consumption and gwp_consumption.

πŸ”§ Improvements

  • [UI] display of the environmental brick in the OPS pane and experiments_set metric results.

v0.3.1

02 Apr 19:00

Choose a tag to compare

πŸš€ Features

  • [API] Support Anthropic, Openai, Mistral and Albert providers for judge models judge_model parameter in experiments (models are fetch from the openai api v1/models endpoints)
  • [SCRIPTS] ADD convenient scripts to run experiment from an isolated environment (e.g. like cortex, see the tutorial )

πŸ”§ Improvements

  • [UI] Add a special card for orphan experiments at the bottom of the experiments list.
  • [UI] Order the experiment set from the newest first
  • [UI] Remove old confusing experiments menu in favor of only the experiment sets menu (renamed simply experiments)

v0.3

27 Mar 17:55

Choose a tag to compare

πŸš€ Features

  • Integrated MCP support and multi-step LLM generations with MCP client bridge.
  • Added experiment set with cross-validation parameters and demo notebooks.
  • Integrated multiple RAG metrics for deep evaluation.
  • Supported delete experiment route for admin users.
  • Introduced new retry and post routes with UI improvements.
  • Added experiments 'finished' and 'failure' ratio in overview.
  • New tests for an increase code coverage and addressed pydantic warnings.
  • Implemented loop limit and tool call step saving.
  • Improved sorting and metrics highlighting in the experiment set score table.

πŸ› Bug Fixes

  • Enhanced error handling for missing metric input and baseline demo notebook.
  • Removed unnecessary attributes and improved schema validation.
  • Fixed various UI bugs and improved experiment view.
  • Improved notebook variable names and used public endpoints.
  • Enhanced GitHub Actions CI and addressed Alembic issues.
  • Corrected schema serialization and computation needs.
  • Improved experiment status updates and endpoint terminology.
  • Handled unknown model cases and improved dataset visibility.
  • Fixed various typos and improved model sorting and ops board status.
  • Improved schema validation and error detail return for API.
  • Addressed issues with experiment view and retry functionality.

πŸ› οΈ Code Improvements

  • Reorganized code structure (pip ready) and fixed import issues.
  • Moved API components to clients and adjusted imports accordingly.

πŸ”₯ Hotfixes

  • Addressed dataset and SQL float compatibility issues.
  • Updated configuration files for supervisord and Alembic.

βš™οΈ Operations

  • Added Docker and Streamlit configuration files.
  • Fix supervisord path to deploy.