`ilab model evaluate` command and eval library usage by cdoern · Pull Request #1369 · instructlab/instructlab

cdoern · Jun 14, 2024

added support for ilab model evaluate which allows users to run MMLU Bench MT Bench, MMLU Branch, and MT Branch Benchmarks
Add an _evaluate class to config.yaml so that users can get sane evaluation defaults that they can see and modify. These funnel directly into the evaluation flags as training now does.
a sample evaluation class looks like:

evaluate:
  branch: null
  mmlu:
    batch_size: 5
    few_shots: 2
  mmlu_branch:
    sdg_path: generated
  model_name: null
  mt:
    judge_model: prometheus-eval/prometheus-8x7b-v2.0
    max_workers: 40
    output_dir: eval_data
  mt_branch:
    taxonomy_path: taxonomy

cdoern · Jun 14, 2024

this is a placeholder for now, lmk when the library is somewhere I can access and import (with the actual code)

nathan-weinberg · Jun 17, 2024

@cdoern you can install directory from test.pypy.org for testing if you wish: https://test.pypi.org/project/instructlab-eval

once we have a 0.0.1 release we'll publish that to production PyPI

src/instructlab/model/evaluate.py

mergify · Jun 20, 2024

This pull request has merge conflicts that must be resolved before it can be
merged. @cdoern please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

src/instructlab/model/evaluate.py

src/instructlab/configuration.py

Signed-off-by: Charlie Doern <cdoern@redhat.com>

Signed-off-by: Nathan Weinberg <nweinber@redhat.com>

Signed-off-by: Dan McPherson <dmcphers@redhat.com>

danmcp

I am approving with a few notes:

The error handling still needs a decent bit of work both here and the eval library. -> Replace stack traces with nice messages.
This version is very chatty in the cli. Will largely be addressed with https://github.com/instructlab/instructlab/compare/main...danmcp:instructlab:evalandvllm?expand=1. Those commit(s) also address the vllm serving hack still in place in this PR.
More test cases are needed
Config defaulting probably still needs some work with the models selected

src/instructlab/model/evaluate.py

alinaryan

Thanks for the strong work on this!! Just have a few comments/q's:

src/instructlab/configuration.py

src/instructlab/model/evaluate.py

cdoern · Jun 30, 2024

made follow up issues!

cdoern · Jun 30, 2024

Given that tomorrow morning (7/1) is a deadline. This PR needs to be merged before then to add some form of evaluation support.

That being said, there is a stale change request on this PR, some pending reviews that haven't come in yet, etc.

We will be dismissing those in favor of deferring to follow up issues but if any of the reviewers have immediate follow up concerns please feel free to reach out!!!!

russellb · Jun 30, 2024

I filed #1540 as a follow-up to get this tested in e2e CI

cdoern requested review from alimaredia, alinaryan, danmcp and nathan-weinberg June 14, 2024 20:04

mergify bot added the ci-failure PR has at least one CI failure label Jun 14, 2024

mergify bot added ci-failure PR has at least one CI failure and removed ci-failure PR has at least one CI failure labels Jun 18, 2024

russellb self-requested a review June 19, 2024 00:26

leseb suggested changes Jun 19, 2024

View reviewed changes

src/instructlab/model/evaluate.py Outdated Show resolved Hide resolved

src/instructlab/model/evaluate.py Outdated Show resolved Hide resolved

src/instructlab/model/evaluate.py Show resolved Hide resolved

src/instructlab/model/evaluate.py Outdated Show resolved Hide resolved

mergify bot added the needs-rebase This Pull Request needs to be rebased label Jun 20, 2024

cdoern force-pushed the eval branch from 49c39fa to 1b08443 Compare June 23, 2024 19:22

mergify bot removed the ci-failure PR has at least one CI failure label Jun 23, 2024

cdoern force-pushed the eval branch from 1b08443 to 6271d1b Compare June 23, 2024 19:26

mergify bot added ci-failure PR has at least one CI failure and removed needs-rebase This Pull Request needs to be rebased ci-failure PR has at least one CI failure labels Jun 23, 2024

danmcp suggested changes Jun 24, 2024

View reviewed changes

mergify bot added ci-failure PR has at least one CI failure and removed ci-failure PR has at least one CI failure labels Jun 25, 2024

danmcp reviewed Jun 25, 2024

View reviewed changes

src/instructlab/configuration.py Outdated Show resolved Hide resolved

cdoern and others added 4 commits June 29, 2024 09:55

Aligned naming scheme for model variables

58e6004

Signed-off-by: Charlie Doern <cdoern@redhat.com>

Fix linting

606da24

Signed-off-by: Nathan Weinberg <nweinber@redhat.com>

Addition alignment and lint fixes

05bfb5c

Signed-off-by: Nathan Weinberg <nweinber@redhat.com>

Fix overriden judge_model variable

b7a166a

Signed-off-by: Dan McPherson <dmcphers@redhat.com>

danmcp dismissed nathan-weinberg’s stale review via b7a166a June 29, 2024 13:56

danmcp force-pushed the eval branch from 096a95f to b7a166a Compare June 29, 2024 13:56

mergify bot added ci-failure PR has at least one CI failure and removed needs-rebase This Pull Request needs to be rebased one-approval PR has one approval from a maintainer labels Jun 29, 2024

Set evaluate taxonomy and model in config from init

df738f5

Signed-off-by: Dan McPherson <dmcphers@redhat.com>

danmcp force-pushed the eval branch from 31ebdf9 to df738f5 Compare June 29, 2024 15:09

mergify bot removed the ci-failure PR has at least one CI failure label Jun 29, 2024

danmcp approved these changes Jun 29, 2024

View reviewed changes

src/instructlab/model/evaluate.py Show resolved Hide resolved

src/instructlab/model/evaluate.py Show resolved Hide resolved

mergify bot added the one-approval PR has one approval from a maintainer label Jun 29, 2024

alinaryan reviewed Jun 30, 2024

View reviewed changes

alinaryan approved these changes Jun 30, 2024

View reviewed changes

mergify bot removed the one-approval PR has one approval from a maintainer label Jun 30, 2024

alinaryan assigned cdoern Jun 30, 2024

alinaryan mentioned this pull request Jun 30, 2024

refer to the mmlu task constant #1534

Closed

This was referenced Jun 30, 2024

improve error handling and user feedback for ilab model evaluate #1535

Closed

Add test cases for ilab model evaluate outside of the config #1536

Closed

Ensure config defaulting for ilab model evaluate works in all scenarios #1537

Closed

cdoern mentioned this pull request Jun 30, 2024

Expand documentation and comments on new configuration classes #1538

Closed

alinaryan merged commit e3169a5 into instructlab:main Jun 30, 2024

danmcp mentioned this pull request Jun 30, 2024

Use new vllm/llama-cpp backends for evaluate #1539

Merged

3 tasks

russellb mentioned this pull request Jun 30, 2024

Add model evaluate command to e2e CI #1540

Closed

ktam3 added this to the 0.18.0 milestone Jul 15, 2024

Search code, repositories, users, issues, pull requests...

Comments

Conversation

cdoern commented Jun 14, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cdoern commented Jun 14, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nathan-weinberg commented Jun 17, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mergify bot commented Jun 20, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

danmcp left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

alinaryan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cdoern commented Jun 30, 2024

Uh oh!

cdoern commented Jun 30, 2024

Uh oh!

russellb commented Jun 30, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

cdoern commented Jun 14, 2024 •

edited

Loading

cdoern commented Jun 14, 2024 •

edited

Loading