Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Releases: instructlab/eval

Leaderboard v0.6.0

16 Apr 05:42
cea8acd

Choose a tag to compare

Leaderboard v0.6.0

This release of the InstructLab/eval library provides support for the Leaderboardv2 benchmark.

To use the new leaderboard evaluator, install it with pip install instructlab-eval[leaderboard] and then import LeaderboardV2Evaluator from instructlab.eval.leaderboard:

from instructlab.eval.leaderboard import LeaderboardV2Evaluator

evaluator = LeaderboardV2Evaluator(model_path="meta-llama/Llama-3.1-8B-Instruct", num_gpus=8)
result = evaluator.run()
print(f"Results for meta-llama/Llama-3.1-8B-Instruct: {result['overall_score']}")

This new evaluator supports running in one of two ways:

  • Running locally: this will evaluate in an optimized fashion by splitting tasks between vLLM and HF Transformers
  • Running remotely: You can provide an OpenAI client and this will evaluator will simply make calls there.

What's Changed

Here's a comprehensive outline of all the changes made:

  • ci: Add OpenAI keys into CI by @alimaredia in #221
  • build(deps): bump sarisia/actions-status-discord from 1.15.1 to 1.15.3 by @dependabot in #220
  • build(deps): bump hynek/build-and-inspect-python-package from 2.11.0 to 2.12.0 by @dependabot in #217
  • build(deps): bump rhysd/actionlint from 1.7.4 to 1.7.7 in /.github/workflows by @dependabot in #216
  • build(deps): bump step-security/harden-runner from 2.10.3 to 2.10.4 by @dependabot in #215
  • build(deps): bump DavidAnson/markdownlint-cli2-action from 18.0.0 to 19.1.0 by @dependabot in #213
  • build(deps): bump rojopolis/spellcheck-github-actions from 0.45.0 to 0.46.0 by @dependabot in #207
  • ci: Don't require secrets in medium e2e test by @danmcp in #226
  • build(deps): bump actions/setup-python from 5.3.0 to 5.4.0 by @dependabot in #225
  • build(deps): bump machulav/ec2-github-runner from 2.3.7 to 2.3.8 by @dependabot in #224
  • build(deps): bump aws-actions/configure-aws-credentials from 4.0.2 to 4.0.3 by @dependabot in #223
  • build(deps): bump pypa/gh-action-pypi-publish from 1.12.3 to 1.12.4 by @dependabot in #222
  • build(deps): bump aws-actions/configure-aws-credentials from 4.0.3 to 4.1.0 by @dependabot in #228
  • build(deps): bump rojopolis/spellcheck-github-actions from 0.46.0 to 0.47.0 by @dependabot in #229
  • build(deps): bump step-security/harden-runner from 2.10.4 to 2.11.0 by @dependabot in #230
  • build(deps): bump actions/cache from 4.2.0 to 4.2.1 by @dependabot in #231
  • build(deps): bump actions/cache from 4.2.1 to 4.2.2 by @dependabot in #233
  • build(deps): bump actions/download-artifact from 4.1.8 to 4.1.9 by @dependabot in #232
  • build(deps): bump actions/setup-python from 5.4.0 to 5.5.0 by @dependabot in #239
  • build(deps): bump rojopolis/spellcheck-github-actions from 0.47.0 to 0.48.0 by @dependabot in #240
  • build(deps): bump step-security/harden-runner from 2.11.0 to 2.11.1 by @dependabot in #241
  • build(deps): bump actions/download-artifact from 4.1.9 to 4.2.1 by @dependabot in #237
  • build(deps): bump actions/cache from 4.2.2 to 4.2.3 by @dependabot in #236
  • Implement leaderboard as a benchmark by @RobotSail in #234

Full Changelog: v0.5.1...v0.6.0

v0.5.1

21 Jan 16:24
bdece44

Choose a tag to compare

What's Changed

  • chore: Change default temporary write directory in all e2e CI jobs from tmpfs to /home/tmp by @courtneypacheco in #210
  • build(deps): bump step-security/harden-runner from 2.10.2 to 2.10.3 by @dependabot in #209
  • Bump ragas version by @alimaredia in #212

New Contributors

Full Changelog: v0.5.0...v0.5.1

v0.5.0

09 Jan 23:23
e31d19b

Choose a tag to compare

What's Changed

Full Changelog: v0.4.2...v0.5.0

v0.4.2

13 Dec 22:29
c086116

Choose a tag to compare

What's Changed

  • build(deps): bump DavidAnson/markdownlint-cli2-action from 17.0.0 to 18.0.0 by @dependabot in #180
  • Adjust to slack-github-action 2.0 api changes by @danmcp in #182
  • Don't fail fast for unit and functional tests by @danmcp in #183
  • Add make judge single test by @danmcp in #184
  • Add reorg answer file test by @danmcp in #185
  • Add disk check after tests run by @danmcp in #190
  • Move AWS_REGION from using secret to var by @danmcp in #191
  • build(deps): bump actions/cache from 4.1.2 to 4.2.0 by @dependabot in #192
  • build(deps): bump step-security/harden-runner from 2.10.1 to 2.10.2 by @dependabot in #186
  • Allows MMLU to have the system_prompt provided to it by @RobotSail in #197

New Contributors

Full Changelog: v0.4.1...v0.4.2

v0.4.1

14 Nov 22:19
4bde0b3

Choose a tag to compare

What's Changed

  • Handle no valid eval results for mt_bench by @danmcp in #179

Full Changelog: v0.4.0...v0.4.1

v0.4.0

12 Nov 23:44
8e32704

Choose a tag to compare

What's Changed

  • build(deps): bump rhysd/actionlint from 1.7.2 to 1.7.3 in /.github/workflows by @dependabot in #142
  • Add missing comment for error_rate return by @danmcp in #141
  • build(deps): bump rojopolis/spellcheck-github-actions from 0.42.0 to 0.43.0 by @dependabot in #147
  • build(deps): bump actions/checkout from 4.2.0 to 4.2.1 by @dependabot in #146
  • build(deps-dev): update pre-commit requirement from <4.0,>=3.0.4 to >=3.0.4,<5.0 by @dependabot in #145
  • build(deps): bump pypa/gh-action-pypi-publish from 1.10.2 to 1.10.3 by @dependabot in #144
  • chore: rename 'basic-workflow-tests' to 'e2e-custom' by @nathan-weinberg in #152
  • build(deps): bump rojopolis/spellcheck-github-actions from 0.43.0 to 0.43.1 by @dependabot in #154
  • Give nice error for empty taxonomy by @danmcp in #151
  • ci: change small E2E CI job to medium by @nathan-weinberg in #155
  • ci: add large-size E2E CI job by @nathan-weinberg in #157
  • ci: use org variable for AWS EC2 AMI in E2E CI jobs by @nathan-weinberg in #159
  • build(deps): bump rojopolis/spellcheck-github-actions from 0.43.1 to 0.44.0 by @dependabot in #160
  • build(deps): bump actions/setup-python from 5.2.0 to 5.3.0 by @dependabot in #161
  • ci: convert med E2E CI job to L4 GPU by @nathan-weinberg in #162
  • build(deps): bump actions/checkout from 4.2.1 to 4.2.2 by @dependabot in #158
  • build(deps): bump pypa/gh-action-pypi-publish from 1.10.3 to 1.11.0 by @dependabot in #164
  • feat: use custom http_client by @leseb in #163
  • build(deps): bump hynek/build-and-inspect-python-package from 2.9.0 to 2.10.0 by @dependabot in #166
  • build(deps): bump machulav/ec2-github-runner from 2.3.6 to 2.3.7 by @dependabot in #167
  • Add facilities for unit and functional tests by @danmcp in #165
  • build(deps): bump rhysd/actionlint from 1.7.3 to 1.7.4 in /.github/workflows by @dependabot in #168
  • build(deps): bump pypa/gh-action-pypi-publish from 1.11.0 to 1.12.0 by @dependabot in #170
  • build(deps): bump rojopolis/spellcheck-github-actions from 0.44.0 to 0.45.0 by @dependabot in #171
  • build(deps): bump pypa/gh-action-pypi-publish from 1.12.0 to 1.12.2 by @dependabot in #175
  • Add check data unit tests by @danmcp in #169
  • Undo commit of unit cov and add to gitignore by @danmcp in #172
  • Remove functional test output and add to .gitignore by @danmcp in #173
  • Add model adapter unit tests by @danmcp in #174

New Contributors

Full Changelog: v0.3.1...v0.4.0

v0.3.1

01 Oct 01:45
c05af4d

Choose a tag to compare

What's Changed

  • Remove task logic with lm_eval 0.4.4 for agg_score by @danmcp in #143

Full Changelog: v0.3.0...v0.3.1

v0.3.0

28 Sep 01:07
40cc370

Choose a tag to compare

What's Changed

Note: This release contains two changes which aren't backwards compatible:

  • Remove max_workers and serving_gpus from constructor by @danmcp in #140
  • return overall_score from MTBenchBranch.judge_answers() by @alimaredia in #138

Full Changelog: v0.2.1...v0.3.0

v0.2.1

23 Sep 14:10
53d6abf

Choose a tag to compare

What's Changed

  • update README by @sallyom in #108
  • Use single answer file and model list (backport #110) by @mergify in #112
  • mergify: add mergify configuration by @nathan-weinberg in #114
  • Bump step-security/harden-runner from 2.8.1 to 2.9.1 by @dependabot in #94
  • ci: move E2E runner from github to AWS by @nathan-weinberg in #118
  • docs: add initial release strategy doc and CHANGELOG by @nathan-weinberg in #91
  • CI: Fix working directories to be relative by @danmcp in #120
  • Bump actions/setup-python from 5.1.1 to 5.2.0 by @dependabot in #119
  • Bump actions/checkout from 4.1.6 to 4.1.7 by @dependabot in #116
  • build(deps): bump pypa/gh-action-pypi-publish from 1.9.0 to 1.10.0 by @dependabot in #122
  • ci: add AWS tags to show github ref and PR num for all jobs by @nathan-weinberg in #123
  • Bump rojopolis/spellcheck-github-actions from 0.38.0 to 0.41.0 by @dependabot in #96
  • build(deps): bump pypa/gh-action-pypi-publish from 1.10.0 to 1.10.1 by @dependabot in #124
  • build(deps): bump hynek/build-and-inspect-python-package from 2.6.0 to 2.9.0 by @dependabot in #125
  • build(deps): bump DavidAnson/markdownlint-cli2-action from 16.0.0 to 17.0.0 by @dependabot in #126
  • build(deps): bump step-security/harden-runner from 2.9.1 to 2.10.1 by @dependabot in #127
  • Add comment to make it clear how the code is working by @danmcp in #105
  • Allow for external serving to be used with mmlu by @danmcp in #99
  • Better path and string handling by @danmcp in #106
  • Improve logging by @danmcp in #111
  • Cleanup usage of load model answers by @danmcp in #115
  • add option to pass 'api_key' to gen_answers, judge_answers by @sallyom in #128
  • e2e: only run PR job if certain files are changed by @nathan-weinberg in #131
  • Allow max_workers to be passed in after evaluator is created by @danmcp in #107
  • Remove fastchat dependency by @danmcp in #98

New Contributors

Full Changelog: v0.2.0...v0.2.1

v0.1.2

27 Aug 23:30
ff54038

Choose a tag to compare

What's Changed

  • Use single answer file and model list by @danmcp in #110

Full Changelog: v0.1.1...v0.1.2

Morty Proxy This is a proxified and sanitized view of the page, visit original site.