Skip to content

Releases: instructlab/eval

v0.2.0

23 Aug 01:57
ec709c7

Choose a tag to compare

What's Changed

  • Changing few_shots default to 5 by @danmcp in #92
  • Don't sleep on last retry attempt by @booxter in #84
  • github: add action to free runner disk space for tox installs by @nathan-weinberg in #93
  • Remove remaining print()s from the library by @booxter in #86
  • Fix e2e by removing old option by @danmcp in #102
  • Default to merge_system_user_message if mistral model detected by @danmcp in #100
  • Dont retry on connection failure by @danmcp in #103
  • Add optional auto tuning for max_workers by @danmcp in #101

New Contributors

Full Changelog: v0.1.1...v0.2.0

v0.1.1

01 Aug 20:52
d272c80

Choose a tag to compare

What's Changed

  • Bump sigstore/gh-action-sigstore-python from 2.1.1 to 3.0.0 by @dependabot in #70
  • Revert "Bump sigstore/gh-action-sigstore-python from 2.1.1 to 3.0.0" by @alinaryan in #72
  • feat: add new InvalidModelError and handling by @nathan-weinberg in #79
  • small docs update for clarity by @makelinux in #81
  • fix: use the context correctly in mt_bench_branch by @bcrochet in #90
  • fix: catch KeyError in mt_bench_branch by @bcrochet in #89
  • fix: mt_bench_branch should ignore knowledge in generate by @bcrochet in #88

New Contributors

Full Changelog: v0.1.0...v0.1.1

v0.1.0

15 Jul 16:27
ae6097f

Choose a tag to compare

What's Changed

  • Fixing up test case after api changes to add error_rate by @danmcp in #63
  • Inherit logging from caller rather than from vLLM by @danmcp in #66
  • Update batch size description and allow for str by @danmcp in #67
  • Don't set basicConfig from libraries by @danmcp in #69

Full Changelog: v0.0.9...v0.1.0

v0.0.9

12 Jul 14:48
5257e23

Choose a tag to compare

What's Changed

  • [mmlu] Allow optionally setting a PyTorch device by @alinaryan in #62
  • Error handling with sdg_path not found and invalid by @danmcp in #61
  • Rename sdg_path to tasks_dir by @danmcp in #64

Full Changelog: v0.0.8...v0.0.9

v0.0.8

09 Jul 16:37
2a27715

Choose a tag to compare

What's Changed

  • fix: Add specific error handling around git repo input by @danmcp in #52
  • Removing a linting ignore by @danmcp in #58

Full Changelog: v0.0.7...v0.0.8

v0.0.7

08 Jul 23:16
450acaf

Choose a tag to compare

What's Changed

  • Bump actions/download-artifact from 4.1.7 to 4.1.8 by @dependabot in #53
  • Add missing license identifiers by @danmcp in #56
  • Parameterize ILAB_EVAL_MERGE_SYS_USR by @danmcp in #57
  • Adding basic logging facilities for eval with a first pass at some useful logging by @danmcp in #55

Full Changelog: v0.0.6...v0.0.7

v0.0.6

08 Jul 13:43
6c537c5

Choose a tag to compare

What's Changed

Full Changelog: v0.0.5...v0.0.6

v0.0.5

03 Jul 15:09
4c89cc3

Choose a tag to compare

What's Changed

  • Remove task categories from mmlu tasklist by @alinaryan in #46
  • Add entry points for evaluator classes by @tiran in #45
  • e2e: Only run one job at a time for a given PR by @russellb in #47

New Contributors

  • @tiran made their first contribution in #45

Full Changelog: v0.0.4...v0.0.5

v0.0.4

01 Jul 15:35
7642cab

Choose a tag to compare

What's Changed

  • e2e: Fix permissions error by @russellb in #34
  • Include qna_file in mt_bench_branch results by @danmcp in #33
  • Include task scores with mmlu results + adjust default api retries by @danmcp in #37
  • Bump lm-eval minimum version to 0.4.3 by @nathan-weinberg in #44
  • Allow first_n option for gen answers, fix return values with max_workers=1 and only print api errors on final failure by @danmcp in #41
  • Read question_id as a string to preserve precision by @danmcp in #42

New Contributors

Full Changelog: v0.0.3...v0.0.4

v0.0.3

27 Jun 18:42
f79ce58

Choose a tag to compare

What's Changed

Full Changelog: v0.0.2...v0.0.3