|
1 | 1 | # Changelog |
2 | 2 |
|
3 | | -## Unreleased |
| 3 | +## 0.7.0 |
4 | 4 |
|
5 | | -- The elo-calculation logic has been heavily simplified |
6 | | -- `Payoff` from the elo package has been renamed to `Match` |
7 | | -- `PayoffMatrix` from the elo package has been renamed to `MatchOutcome` and is now pydantic (de)-serializable |
8 | | -- `SingleChunkQa` now uses a logit_bias to promote not answering for German |
9 | | -- `__init__`-parameters of all tasks are streamlined: |
| 5 | +### Breaking Changes |
| 6 | +- breaking change: FScores are now correctly exposed as FScores and no longer as RougeScores |
| 7 | +- breaking change: HuggingFaceAggregationRepository and HuggingFaceDatasetRepository now consistently follow the same folder structure as FileDatasetRepository when creating data sets. This means that datasets will be stored in a folder datasets and additional sub-folders named according to the respective dataset ID. |
| 8 | +- breaking change: Split run_repository into file_run_repository, in_memory_run_repository. |
| 9 | +- breaking change: Split evaluation_repository into argilla_evaluation_repository, file_evaluation_repository and in_memory_evaluation_repository |
| 10 | +- breaking change: Split dataset_repository into file_dataset_repository and in_memory_dataset_respository |
| 11 | +- breaking change: Split aggregation_respository into file_aggragation_repository and in_memory_aggregation_repository |
| 12 | +- breaking change: Renamed evaluation/run.py to evaluation/run_evaluator.py |
| 13 | +- breaking change: Split evaluation/domain and distribute it across aggregation, evaluation, dataset and run packages. |
| 14 | +- breaking change: Split evaluation/argilla and distribute it across aggregation and evaluation packages. |
| 15 | +- breaking change: Split evaluation into separate dataset, run, evaluation and aggregationpackages. |
| 16 | +- breaking change: Split evaluation/hugging_face.py into dataset and aggregation repository files in data_storage package. |
| 17 | +- breaking change: create_dataset now returns the new Dataset type instead of a dataset ID. |
| 18 | +- breaking change: Consistent naming for repository root directories when creating evaluations or aggregations: |
| 19 | + - .../eval → .../evaluations and .../aggregation → aggregations. |
| 20 | +- breaking change: Core tasks not longer provide defaults for the applied models. |
| 21 | +- breaking change: Methods returning entities from repositories now return the results ordered by their IDs. |
| 22 | +- breaking change: Renamed crashed_during_eval_count to crashed_during_evaluation_count in AggregationOverview. |
| 23 | +- breaking change: Renamed create_evaluation_dataset to initialize_evaluation in EvaluationRepository. |
| 24 | +- breaking change: Renamed to_explanation_response to to_explanation_request in ExplainInput. |
| 25 | +- breaking change: Removed TextHighlight::text in favor of TextHighlight::start and TextHighlight::end |
| 26 | +- breaking change: Removed `IntelligenceApp` and `IntelligenceStarterApp` |
| 27 | +- breaking change: RetrieverBasedQa uses now MultiChunkQa instead of generic task pr SingleChunkQa |
| 28 | +- breaking change: EvaluationRepository failed_example_evaluations no longer abstract |
| 29 | +- breaking change: Elo calculation simplified: |
| 30 | + - Payoff from elo package has been removed |
| 31 | + - PayoffMatrix from elo package renamed to MatchOutcome |
| 32 | + - SingleChunkQa uses logit_bias to promote not answering for German |
| 33 | +- breaking change: Remove ChunkOverlap task. |
| 34 | +- breaking change: Rename Chunk to TextChunk. |
| 35 | +- breaking change: Rename ChunkTask to Chunk . |
| 36 | +- breaking change: Rename EchoTask to Echo. |
| 37 | +- breaking change: Rename TextHighlightTask to TextHighlightbreaking change: Rename ChunkOverlaptTask to ChunkOverlap |
| 38 | + |
| 39 | +### New Features |
| 40 | + |
| 41 | +- Aggregation: |
| 42 | + - feature: InstructComparisonArgillaAggregationLogic uses full evaluation set instead of sample for aggregation |
| 43 | + |
| 44 | +- Documentation |
| 45 | + |
| 46 | + - feature: Added How-To’s (linked in the README): |
| 47 | + - how to define a task |
| 48 | + - how to implement a task |
| 49 | + - how to create a dataset |
| 50 | + - how to run a task on a dataset |
| 51 | + - how to perform aggregation |
| 52 | + - how to evaluate runs |
| 53 | + - feature: Restructured and cleaned up README for more conciseness. |
| 54 | + - feature: Add illustrations to Concepts.md. |
| 55 | + - feature: Added tutorial for adding task to a FastAPI app (linked in README). |
| 56 | + - feature: Improved and added various DocStrings. |
| 57 | + - feature: Added a README section about the client URL. |
| 58 | + - feature: Add python naming convention to README |
| 59 | + |
| 60 | +- Classify |
| 61 | + - feature: PromptBasedClassify now supports changing of the prompt instruction via the instruction parameter. |
| 62 | + - feature: Add default model for PromptBasedClassify |
| 63 | + - feature: Add default task for PromptBasedClassify |
| 64 | + |
| 65 | +- Evaluation |
| 66 | + - feature: All repositories will return a ValueError when trying to access a dataset that does not exist while also trying to access an entry of the dataset. If only the dataset is retrieved, it will return None. |
| 67 | + - `ArgillaEvaluationRepository` now handles failed evaluations. |
| 68 | + - feature: Added SingleHuggingfaceDatasetRepository. |
| 69 | + - feature: Added HighlightCoverageGrader. |
| 70 | + - feature: Added LanguageMatchesGrader. |
| 71 | + |
| 72 | + - feature: Added prettier default printing behavior of repository entities by providing overloads to __str__ and __repr__ methods. |
| 73 | + |
| 74 | + - feature: Added abstract HuggingFace repository base-class. |
| 75 | + |
| 76 | + - feature: Refactoring of HuggingFace repository |
| 77 | + |
| 78 | + - feature: Added HuggingFaceAggregationRepository. |
| 79 | + - feature: Added template method to individual repository |
| 80 | + - feature: Added Dataset model to dataset repository. This allows to store a short descriptive name for the dataset for easier identification |
| 81 | + - feature: SingleChunkQa internally now uses the same model in TextHighlight by default. |
| 82 | + - feature: MeanAccumulator tracks standard deviation and standard error |
| 83 | + - feature: EloCalculator now updates ranking after each match |
| 84 | + - feature: Add data selection methods to repositories: |
| 85 | + - AggregationRepository::aggregation_overviews |
| 86 | + - EvaluationRepository::run_overviews |
| 87 | + - EvaluationRepository::run_overview_ids |
| 88 | + - EvaluationRepository::example_output |
| 89 | + - EvaluationRepository::example_outputs |
| 90 | + - EvaluationRepository::example_output_ids |
| 91 | + - EvaluationRepository::example_trace |
| 92 | + - EvaluationRepository::example_tracer |
| 93 | + - RunRepository::run_overviews |
| 94 | + - RunRepository::run_overview_ids |
| 95 | + - RunRepository::example_output |
| 96 | + - RunRepository::example_outputs |
| 97 | + - RunRepository::example_output_ids |
| 98 | + - RunRepository::example_trace |
| 99 | + - RunRepository::example_tracer |
| 100 | + |
| 101 | + - feature: Evaluator continues in case of no successful outputs |
| 102 | + |
| 103 | +- Q & A |
| 104 | + |
| 105 | + - feature: Define default parameters for LongContextQa, SingleChunkQa |
| 106 | + - feature: Define default task for RetrieverBasedQa |
| 107 | + - feature: Define default model for KeyWordExtract, MultiChunkQa, |
| 108 | + - feature: Improved focus of highlights in TextHighlight tasks. |
| 109 | + - feature: Added filtering for TextHighlight tasks. |
| 110 | + - feature: Introduce logit_bias to SingleChunkQa |
| 111 | + |
| 112 | +- Summarize |
| 113 | + - feature: Added RecursiveSummarizeInput. |
| 114 | + - feature: Define defaults for SteerableSingleChunkSummarize,SteerableLongContexSummarize, RecursiveSummarize |
| 115 | + |
| 116 | +- Tracer |
| 117 | + - feature: Added better trace viewer integration: |
| 118 | + - Add trace storage to trace viewer server |
| 119 | + - added submit_to_tracer_viewer method to InMemoryTracer |
| 120 | + - UI and navigation improvements for trace viewer |
| 121 | + - Add exception handling for tracers during log entry writing |
| 122 | + |
| 123 | +- Others |
| 124 | + |
| 125 | + - feature: The following classes are now exposed: |
| 126 | + - DocumentChunk |
| 127 | + - MultipleChunkQaOutput |
| 128 | + - Subanswer |
| 129 | + - feature: Simplified internal imports. |
| 130 | + - feature: Stream lining of __init__-parameters of all tasks |
10 | 131 | - Sub-tasks are typically exposed as `__init__`-parameters with sensible defaults. |
11 | | - - Defaults for non-trivial parameters like models or tasks are defined in `__init__` while the default parameter is `None`. |
| 132 | + - Defaults for non-trivial parameters like models or tasks are defined in __init__while the default parameter is None. |
12 | 133 | - Instead of exposing parameters that are passed on to sub-tasks the sub-task themselves are exposed. |
13 | | -- `IntelligenceApp` and `IntelligenceStarterApp` have been removed. |
| 134 | + - feature: Update supported models |
| 135 | + |
| 136 | +### Fixes |
14 | 137 |
|
| 138 | +- fix: Fixed exception handling in language detection of LanguageMatchesGrader. |
| 139 | +- fix: Fixed a bug that could lead to cut-off highlight ranges in TextHighlight tasks. |
| 140 | +- fix: Fixed list_ids methods to use path_to_str |
| 141 | +- fix: Disallow traces without end in the trace viewer |
| 142 | +- fix: ArgillaClient now correctly uses provided API-URL instead of hard-coded localhost |
15 | 143 |
|
16 | 144 | ## 0.6.0 |
17 | 145 |
|
|
0 commit comments