Releases · instructlab/eval

23 Aug 01:57

danmcp

v0.2.0

ec709c7

v0.2.0

What's Changed

Changing few_shots default to 5 by @danmcp in #92
Don't sleep on last retry attempt by @booxter in #84
github: add action to free runner disk space for tox installs by @nathan-weinberg in #93
Remove remaining print()s from the library by @booxter in #86
Fix e2e by removing old option by @danmcp in #102
Default to merge_system_user_message if mistral model detected by @danmcp in #100
Dont retry on connection failure by @danmcp in #103
Add optional auto tuning for max_workers by @danmcp in #101

New Contributors

@booxter made their first contribution in #84

Full Changelog: v0.1.1...v0.2.0

Contributors

booxter, danmcp, and nathan-weinberg

Assets 6

01 Aug 20:52

nathan-weinberg

v0.1.1

d272c80

v0.1.1

What's Changed

Bump sigstore/gh-action-sigstore-python from 2.1.1 to 3.0.0 by @dependabot in #70
Revert "Bump sigstore/gh-action-sigstore-python from 2.1.1 to 3.0.0" by @alinaryan in #72
feat: add new InvalidModelError and handling by @nathan-weinberg in #79
small docs update for clarity by @makelinux in #81
fix: use the context correctly in mt_bench_branch by @bcrochet in #90
fix: catch KeyError in mt_bench_branch by @bcrochet in #89
fix: mt_bench_branch should ignore knowledge in generate by @bcrochet in #88

New Contributors

@makelinux made their first contribution in #81
@bcrochet made their first contribution in #90

Full Changelog: v0.1.0...v0.1.1

Contributors

bcrochet, makelinux, and 3 other contributors

Assets 6

15 Jul 16:27

nathan-weinberg

v0.1.0

ae6097f

v0.1.0

What's Changed

Fixing up test case after api changes to add error_rate by @danmcp in #63
Inherit logging from caller rather than from vLLM by @danmcp in #66
Update batch size description and allow for str by @danmcp in #67
Don't set basicConfig from libraries by @danmcp in #69

Full Changelog: v0.0.9...v0.1.0

Contributors

danmcp

Assets 6

12 Jul 14:48

danmcp

v0.0.9

5257e23

v0.0.9

What's Changed

[mmlu] Allow optionally setting a PyTorch device by @alinaryan in #62
Error handling with sdg_path not found and invalid by @danmcp in #61
Rename sdg_path to tasks_dir by @danmcp in #64

Full Changelog: v0.0.8...v0.0.9

Contributors

danmcp and alinaryan

Assets 6

09 Jul 16:37

alinaryan

v0.0.8

2a27715

v0.0.8

What's Changed

fix: Add specific error handling around git repo input by @danmcp in #52
Removing a linting ignore by @danmcp in #58

Full Changelog: v0.0.7...v0.0.8

Contributors

danmcp

Assets 6

08 Jul 23:16

danmcp

v0.0.7

450acaf

v0.0.7

What's Changed

Bump actions/download-artifact from 4.1.7 to 4.1.8 by @dependabot in #53
Add missing license identifiers by @danmcp in #56
Parameterize ILAB_EVAL_MERGE_SYS_USR by @danmcp in #57
Adding basic logging facilities for eval with a first pass at some useful logging by @danmcp in #55

Full Changelog: v0.0.6...v0.0.7

Contributors

danmcp and dependabot

Assets 6

08 Jul 13:43

danmcp

v0.0.6

6c537c5

v0.0.6

What's Changed

Add MMLU tests by @nathan-weinberg in #27
feat: Include error rate in judgment results by @danmcp in #49

Full Changelog: v0.0.5...v0.0.6

Contributors

danmcp and nathan-weinberg

Assets 6

03 Jul 15:09

nathan-weinberg

v0.0.5

4c89cc3

v0.0.5

What's Changed

Remove task categories from mmlu tasklist by @alinaryan in #46
Add entry points for evaluator classes by @tiran in #45
e2e: Only run one job at a time for a given PR by @russellb in #47

New Contributors

@tiran made their first contribution in #45

Full Changelog: v0.0.4...v0.0.5

Contributors

russellb, tiran, and alinaryan

Assets 6

01 Jul 15:35

alinaryan

v0.0.4

7642cab

v0.0.4

What's Changed

e2e: Fix permissions error by @russellb in #34
Include qna_file in mt_bench_branch results by @danmcp in #33
Include task scores with mmlu results + adjust default api retries by @danmcp in #37
Bump lm-eval minimum version to 0.4.3 by @nathan-weinberg in #44
Allow first_n option for gen answers, fix return values with max_workers=1 and only print api errors on final failure by @danmcp in #41
Read question_id as a string to preserve precision by @danmcp in #42

New Contributors

@russellb made their first contribution in #34

Full Changelog: v0.0.3...v0.0.4

Contributors

russellb, danmcp, and nathan-weinberg

Assets 6

27 Jun 18:42

nathan-weinberg

v0.0.3

f79ce58

v0.0.3

What's Changed

Add e2e job to CI by @nathan-weinberg in #14
Add list of default MMLU tasks as a constant by @nathan-weinberg in #24

Full Changelog: v0.0.2...v0.0.3

Contributors

nathan-weinberg

Assets 6

Releases: instructlab/eval

v0.2.0

What's Changed

New Contributors

Contributors

Uh oh!

v0.1.1

What's Changed

New Contributors

Contributors

Uh oh!

v0.1.0

What's Changed

Contributors

Uh oh!

v0.0.9

What's Changed

Contributors

Uh oh!

v0.0.8

What's Changed

Contributors

Uh oh!

v0.0.7

What's Changed

Contributors

Uh oh!

v0.0.6

What's Changed

Contributors

Uh oh!

v0.0.5

What's Changed

New Contributors

Contributors

Uh oh!

v0.0.4

What's Changed

New Contributors

Contributors

Uh oh!

v0.0.3

What's Changed

Contributors

Uh oh!