Skip to content

Commit 601328e

Browse files
doc(changelog): Update CHANGELOG.md with release of version 0.7.0
TASK: IL-412
1 parent a3599a0 commit 601328e

File tree

1 file changed

+136
-8
lines changed

1 file changed

+136
-8
lines changed

CHANGELOG.md

Lines changed: 136 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,17 +1,145 @@
11
# Changelog
22

3-
## Unreleased
3+
## 0.7.0
44

5-
- The elo-calculation logic has been heavily simplified
6-
- `Payoff` from the elo package has been renamed to `Match`
7-
- `PayoffMatrix` from the elo package has been renamed to `MatchOutcome` and is now pydantic (de)-serializable
8-
- `SingleChunkQa` now uses a logit_bias to promote not answering for German
9-
- `__init__`-parameters of all tasks are streamlined:
5+
### Breaking Changes
6+
- breaking change: FScores are now correctly exposed as FScores and no longer as RougeScores
7+
- breaking change: HuggingFaceAggregationRepository and HuggingFaceDatasetRepository now consistently follow the same folder structure as FileDatasetRepository when creating data sets. This means that datasets will be stored in a folder datasets and additional sub-folders named according to the respective dataset ID.
8+
- breaking change: Split run_repository into file_run_repository, in_memory_run_repository.
9+
- breaking change: Split evaluation_repository into argilla_evaluation_repository, file_evaluation_repository and in_memory_evaluation_repository
10+
- breaking change: Split dataset_repository into file_dataset_repository and in_memory_dataset_respository
11+
- breaking change: Split aggregation_respository into file_aggragation_repository and in_memory_aggregation_repository
12+
- breaking change: Renamed evaluation/run.py to evaluation/run_evaluator.py
13+
- breaking change: Split evaluation/domain and distribute it across aggregation, evaluation, dataset and run packages.
14+
- breaking change: Split evaluation/argilla and distribute it across aggregation and evaluation packages.
15+
- breaking change: Split evaluation into separate dataset, run, evaluation and aggregationpackages.
16+
- breaking change: Split evaluation/hugging_face.py into dataset and aggregation repository files in data_storage package.
17+
- breaking change: create_dataset now returns the new Dataset type instead of a dataset ID.
18+
- breaking change: Consistent naming for repository root directories when creating evaluations or aggregations:
19+
- .../eval → .../evaluations and .../aggregation → aggregations.
20+
- breaking change: Core tasks not longer provide defaults for the applied models.
21+
- breaking change: Methods returning entities from repositories now return the results ordered by their IDs.
22+
- breaking change: Renamed crashed_during_eval_count to crashed_during_evaluation_count in AggregationOverview.
23+
- breaking change: Renamed create_evaluation_dataset to initialize_evaluation in EvaluationRepository.
24+
- breaking change: Renamed to_explanation_response to to_explanation_request in ExplainInput.
25+
- breaking change: Removed TextHighlight::text in favor of TextHighlight::start and TextHighlight::end
26+
- breaking change: Removed `IntelligenceApp` and `IntelligenceStarterApp`
27+
- breaking change: RetrieverBasedQa uses now MultiChunkQa instead of generic task pr SingleChunkQa
28+
- breaking change: EvaluationRepository failed_example_evaluations no longer abstract
29+
- breaking change: Elo calculation simplified:
30+
- Payoff from elo package has been removed
31+
- PayoffMatrix from elo package renamed to MatchOutcome
32+
- SingleChunkQa uses logit_bias to promote not answering for German
33+
- breaking change: Remove ChunkOverlap task.
34+
- breaking change: Rename Chunk to TextChunk.
35+
- breaking change: Rename ChunkTask to Chunk .
36+
- breaking change: Rename EchoTask to Echo.
37+
- breaking change: Rename TextHighlightTask to TextHighlightbreaking change: Rename ChunkOverlaptTask to ChunkOverlap
38+
39+
### New Features
40+
41+
- Aggregation:
42+
- feature: InstructComparisonArgillaAggregationLogic uses full evaluation set instead of sample for aggregation
43+
44+
- Documentation
45+
46+
- feature: Added How-To’s (linked in the README):
47+
- how to define a task
48+
- how to implement a task
49+
- how to create a dataset
50+
- how to run a task on a dataset
51+
- how to perform aggregation
52+
- how to evaluate runs
53+
- feature: Restructured and cleaned up README for more conciseness.
54+
- feature: Add illustrations to Concepts.md.
55+
- feature: Added tutorial for adding task to a FastAPI app (linked in README).
56+
- feature: Improved and added various DocStrings.
57+
- feature: Added a README section about the client URL.
58+
- feature: Add python naming convention to README
59+
60+
- Classify
61+
- feature: PromptBasedClassify now supports changing of the prompt instruction via the instruction parameter.
62+
- feature: Add default model for PromptBasedClassify
63+
- feature: Add default task for PromptBasedClassify
64+
65+
- Evaluation
66+
- feature: All repositories will return a ValueError when trying to access a dataset that does not exist while also trying to access an entry of the dataset. If only the dataset is retrieved, it will return None.
67+
- `ArgillaEvaluationRepository` now handles failed evaluations.
68+
- feature: Added SingleHuggingfaceDatasetRepository.
69+
- feature: Added HighlightCoverageGrader.
70+
- feature: Added LanguageMatchesGrader.
71+
72+
- feature: Added prettier default printing behavior of repository entities by providing overloads to __str__ and __repr__ methods.
73+
74+
- feature: Added abstract HuggingFace repository base-class.
75+
76+
- feature: Refactoring of HuggingFace repository
77+
78+
- feature: Added HuggingFaceAggregationRepository.
79+
- feature: Added template method to individual repository
80+
- feature: Added Dataset model to dataset repository. This allows to store a short descriptive name for the dataset for easier identification
81+
- feature: SingleChunkQa internally now uses the same model in TextHighlight by default.
82+
- feature: MeanAccumulator tracks standard deviation and standard error
83+
- feature: EloCalculator now updates ranking after each match
84+
- feature: Add data selection methods to repositories:
85+
- AggregationRepository::aggregation_overviews
86+
- EvaluationRepository::run_overviews
87+
- EvaluationRepository::run_overview_ids
88+
- EvaluationRepository::example_output
89+
- EvaluationRepository::example_outputs
90+
- EvaluationRepository::example_output_ids
91+
- EvaluationRepository::example_trace
92+
- EvaluationRepository::example_tracer
93+
- RunRepository::run_overviews
94+
- RunRepository::run_overview_ids
95+
- RunRepository::example_output
96+
- RunRepository::example_outputs
97+
- RunRepository::example_output_ids
98+
- RunRepository::example_trace
99+
- RunRepository::example_tracer
100+
101+
- feature: Evaluator continues in case of no successful outputs
102+
103+
- Q & A
104+
105+
- feature: Define default parameters for LongContextQa, SingleChunkQa
106+
- feature: Define default task for RetrieverBasedQa
107+
- feature: Define default model for KeyWordExtract, MultiChunkQa,
108+
- feature: Improved focus of highlights in TextHighlight tasks.
109+
- feature: Added filtering for TextHighlight tasks.
110+
- feature: Introduce logit_bias to SingleChunkQa
111+
112+
- Summarize
113+
- feature: Added RecursiveSummarizeInput.
114+
- feature: Define defaults for SteerableSingleChunkSummarize,SteerableLongContexSummarize, RecursiveSummarize
115+
116+
- Tracer
117+
- feature: Added better trace viewer integration:
118+
- Add trace storage to trace viewer server
119+
- added submit_to_tracer_viewer method to InMemoryTracer
120+
- UI and navigation improvements for trace viewer
121+
- Add exception handling for tracers during log entry writing
122+
123+
- Others
124+
125+
- feature: The following classes are now exposed:
126+
- DocumentChunk
127+
- MultipleChunkQaOutput
128+
- Subanswer
129+
- feature: Simplified internal imports.
130+
- feature: Stream lining of __init__-parameters of all tasks
10131
- Sub-tasks are typically exposed as `__init__`-parameters with sensible defaults.
11-
- Defaults for non-trivial parameters like models or tasks are defined in `__init__` while the default parameter is `None`.
132+
- Defaults for non-trivial parameters like models or tasks are defined in __init__while the default parameter is None.
12133
- Instead of exposing parameters that are passed on to sub-tasks the sub-task themselves are exposed.
13-
- `IntelligenceApp` and `IntelligenceStarterApp` have been removed.
134+
- feature: Update supported models
135+
136+
### Fixes
14137

138+
- fix: Fixed exception handling in language detection of LanguageMatchesGrader.
139+
- fix: Fixed a bug that could lead to cut-off highlight ranges in TextHighlight tasks.
140+
- fix: Fixed list_ids methods to use path_to_str
141+
- fix: Disallow traces without end in the trace viewer
142+
- fix: ArgillaClient now correctly uses provided API-URL instead of hard-coded localhost
15143

16144
## 0.6.0
17145

0 commit comments

Comments
 (0)