Remove task logic with lm_eval 0.4.4 for agg_score by danmcp · Pull Request #143 · instructlab/eval

danmcp · Oct 1, 2024

lm_eval used to return an extra entry that corresponded to the tasks requested. Ex: mmlu_pr. As of 0.4.4 the entries are now the same whether the tasks are custom are not and the extra entry is removed. So the agg score now needs to be calculated from the individual task scores returned so the logic can be shared with mmluevaluator.

Without this change, the overall_score for mmlu_branch is being returned as 0.0 with lm_eval 0.4.4

lm_eval used to return an extra entry that corresponded to the tasks requested. Ex: mmlu_pr. As of 0.4.4 the entries are now the same whether the tasks are custom are not and the extra entry is removed. So the agg score now needs to be calculated from the individual task scores returned so the logic can be shared with mmluevaluator. Signed-off-by: Dan McPherson <dmcphers@redhat.com>

alimaredia approved these changes Oct 1, 2024

View reviewed changes

mergify bot added the one-approval label Oct 1, 2024

danmcp requested review from alinaryan and nathan-weinberg October 1, 2024 00:51

nathan-weinberg approved these changes Oct 1, 2024

View reviewed changes

nathan-weinberg removed the request for review from alinaryan October 1, 2024 01:35

mergify bot merged commit c05af4d into instructlab:main Oct 1, 2024

mergify bot removed the one-approval label Oct 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove task logic with lm_eval 0.4.4 for agg_score#143

Remove task logic with lm_eval 0.4.4 for agg_score#143
mergify[bot] merged 1 commit intoinstructlab:maininstructlab/eval:mainfrom
danmcp:aggfixdanmcp/eval:aggfixCopy head branch name to clipboard

danmcp commented Oct 1, 2024 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

Search code, repositories, users, issues, pull requests...

Conversation

danmcp commented Oct 1, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

danmcp commented Oct 1, 2024 •

edited

Loading