Releases: instructlab/eval
Releases · instructlab/eval
v0.2.0
What's Changed
- Changing few_shots default to 5 by @danmcp in #92
- Don't sleep on last retry attempt by @booxter in #84
- github: add action to free runner disk space for tox installs by @nathan-weinberg in #93
- Remove remaining print()s from the library by @booxter in #86
- Fix e2e by removing old option by @danmcp in #102
- Default to merge_system_user_message if mistral model detected by @danmcp in #100
- Dont retry on connection failure by @danmcp in #103
- Add optional auto tuning for max_workers by @danmcp in #101
New Contributors
Full Changelog: v0.1.1...v0.2.0
v0.1.1
What's Changed
- Bump sigstore/gh-action-sigstore-python from 2.1.1 to 3.0.0 by @dependabot in #70
- Revert "Bump sigstore/gh-action-sigstore-python from 2.1.1 to 3.0.0" by @alinaryan in #72
- feat: add new InvalidModelError and handling by @nathan-weinberg in #79
- small docs update for clarity by @makelinux in #81
- fix: use the context correctly in mt_bench_branch by @bcrochet in #90
- fix: catch KeyError in mt_bench_branch by @bcrochet in #89
- fix: mt_bench_branch should ignore knowledge in generate by @bcrochet in #88
New Contributors
- @makelinux made their first contribution in #81
- @bcrochet made their first contribution in #90
Full Changelog: v0.1.0...v0.1.1
v0.1.0
What's Changed
- Fixing up test case after api changes to add error_rate by @danmcp in #63
- Inherit logging from caller rather than from vLLM by @danmcp in #66
- Update batch size description and allow for str by @danmcp in #67
- Don't set basicConfig from libraries by @danmcp in #69
Full Changelog: v0.0.9...v0.1.0
v0.0.9
v0.0.8
v0.0.7
What's Changed
- Bump actions/download-artifact from 4.1.7 to 4.1.8 by @dependabot in #53
- Add missing license identifiers by @danmcp in #56
- Parameterize ILAB_EVAL_MERGE_SYS_USR by @danmcp in #57
- Adding basic logging facilities for eval with a first pass at some useful logging by @danmcp in #55
Full Changelog: v0.0.6...v0.0.7
v0.0.6
What's Changed
- Add MMLU tests by @nathan-weinberg in #27
- feat: Include error rate in judgment results by @danmcp in #49
Full Changelog: v0.0.5...v0.0.6
v0.0.5
What's Changed
- Remove task categories from mmlu tasklist by @alinaryan in #46
- Add entry points for evaluator classes by @tiran in #45
- e2e: Only run one job at a time for a given PR by @russellb in #47
New Contributors
Full Changelog: v0.0.4...v0.0.5
v0.0.4
What's Changed
- e2e: Fix permissions error by @russellb in #34
- Include qna_file in mt_bench_branch results by @danmcp in #33
- Include task scores with mmlu results + adjust default api retries by @danmcp in #37
- Bump lm-eval minimum version to 0.4.3 by @nathan-weinberg in #44
- Allow first_n option for gen answers, fix return values with max_workers=1 and only print api errors on final failure by @danmcp in #41
- Read question_id as a string to preserve precision by @danmcp in #42
New Contributors
Full Changelog: v0.0.3...v0.0.4
v0.0.3
What's Changed
- Add e2e job to CI by @nathan-weinberg in #14
- Add list of default MMLU tasks as a constant by @nathan-weinberg in #24
Full Changelog: v0.0.2...v0.0.3