Allow arbitary trainging args to be overridden by derekhiggins · Pull Request #1008 · instructlab/instructlab

derekhiggins · Apr 25, 2024

Adding as a hidden argument to allow experimentation on various devices. Eventually once we know whats needed we can add something more permanent.

Fixes #1007

With this PR and #1012 , running ilab e2e including training works on colab

!ilab train --device cuda --override-training-args '{"bf16":false, "gradient_checkpointing":true, "gradient_accumulation_steps":8}'

tiran · Apr 26, 2024

src/instructlab/train/linux_train.py

+    training_args["fp16"] = use_fp16
+    training_args["bf16"] = not use_fp16


The bf16 issue is addressed in #993

Sounds good, but this PR isn't really intended to deal with any specific training options, the point is to allow the advanced user to override any of them without needing to make changes to the code.

mergify · Apr 29, 2024

This pull request has merge conflicts that must be resolved before it can be
merged. @derekhiggins please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

maxamillion · Apr 30, 2024

I tested this patch and the following worked for me using my RTX A4000 GPU with 16G of VRAM:

$ ilab train --device cuda --override-training-args '{"bf16":false, "gradient_checkpointing":true, "gradient_accumulation_steps":8}'

TY!

mergify · May 6, 2024

This pull request has merge conflicts that must be resolved before it can be
merged. @derekhiggins please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

mergify · May 7, 2024

This pull request has merge conflicts that must be resolved before it can be
merged. @derekhiggins please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

tyll · May 8, 2024

Due to the complexity of the data, it seems this is better suited to be added to config.yaml instead of passing it on the command line.

leseb

Can we have functional test coverage for this?

src/instructlab/lab.py

derekhiggins · May 23, 2024

Due to the complexity of the data, it seems this is better suited to be added to config.yaml instead of passing it on the command line.

I've added a example of how to use this from a json file e.g. --override-training-args "$(< override_train_args.json)"
would this be enough? As there is no training section currently in the config.yaml and I'm not sure this is a good reason to add one?

Can we have functional test coverage for this?

If this is merged I'll update the e2e tests which should cover it (e.g. #1111 )

Adding as a hidden argument to allow experimentation on various devices. Eventually once we know whats needed we can add something more permanent. Fixes instructlab#1007 Signed-off-by: Derek Higgins <derekh@redhat.com>

mergify · Jun 5, 2024

This pull request has merge conflicts that must be resolved before it can be
merged. @derekhiggins please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

JamesKunstle · Jun 21, 2024

src/instructlab/lab.py

+    try:
+        override_training_args_dict = json.loads(override_training_args)
+    except json.decoder.JSONDecodeError as e:
+        ctx.fail("Parsing override trainign args: " + str(e))


"trainign" nit on spelling.

I think the command fail (CLI exits) if the input is malformed, too, rather than proceeding and making the user ctl-c and reload.

JamesKunstle

This functionality is super desirable. In the churn of designing the CLI in the context of other pillars of the project (SDG, evaluation, publishing), it's become clear that we need more wide-spread and type-checked configuration for everything. @cdoern's inbound "profiles" PR accounts for this, taking a first step toward application-wide default and override configuration support.

@derekhiggins your PR is very very appreciated, we'd love your input on @cdoern's work as well.

russellb · Jun 24, 2024

This functionality is super desirable. In the churn of designing the CLI in the context of other pillars of the project (SDG, evaluation, publishing), it's become clear that we need more wide-spread and type-checked configuration for everything. @cdoern's inbound "profiles" PR accounts for this, taking a first step toward application-wide default and override configuration support.

@derekhiggins your PR is very very appreciated, we'd love your input on @cdoern's work as well.

@JamesKunstle can you provide a link (or links) to the work you're referring to and requesting feedback on?

derekhiggins · Aug 22, 2024

Closing this, a lot has changed since it was created and its probably no longer relevant

derekhiggins force-pushed the override_args branch from 5387d3b to a262c1e Compare April 25, 2024 23:13

github-actions bot added the testing Relates to testing label Apr 25, 2024

derekhiggins mentioned this pull request Apr 26, 2024

#1006 - Optimize CPU training on Linux #1010

Closed

tiran reviewed Apr 26, 2024

View reviewed changes

mergify bot added the needs-rebase This Pull Request needs to be rebased label Apr 29, 2024

derekhiggins force-pushed the override_args branch from a262c1e to 845ff0e Compare April 29, 2024 21:02

github-actions bot removed the testing Relates to testing label Apr 29, 2024

mergify bot removed the needs-rebase This Pull Request needs to be rebased label Apr 29, 2024

derekhiggins force-pushed the override_args branch from 845ff0e to 4617a7d Compare April 29, 2024 21:40

derekhiggins force-pushed the override_args branch from 4617a7d to e80d6d3 Compare April 30, 2024 21:38

mergify bot added the testing Relates to testing label Apr 30, 2024

derekhiggins mentioned this pull request Apr 30, 2024

lab train errors out on 16GB M1 Mac #380

Closed

mergify bot added the needs-rebase This Pull Request needs to be rebased label May 6, 2024

mergify bot added needs-rebase This Pull Request needs to be rebased and removed needs-rebase This Pull Request needs to be rebased labels May 6, 2024

This was referenced May 20, 2024

Introduce InstructLab Profiles Managed via ilab profile... to Run Key Commands at Different Fidelity Levels instructlab/dev-docs#52

Closed

Run e2e training test without 4 bit quant #1111

Closed

leseb reviewed May 23, 2024

View reviewed changes

src/instructlab/lab.py Show resolved Hide resolved

derekhiggins force-pushed the override_args branch from e80d6d3 to 1a5cc23 Compare May 23, 2024 20:41

mergify bot added ci-failure PR has at least one CI failure and removed needs-rebase This Pull Request needs to be rebased labels May 23, 2024

Allow arbitary trainging args to be overridden

4f776f8

Adding as a hidden argument to allow experimentation on various devices. Eventually once we know whats needed we can add something more permanent. Fixes instructlab#1007 Signed-off-by: Derek Higgins <derekh@redhat.com>

derekhiggins force-pushed the override_args branch from 1a5cc23 to 4f776f8 Compare May 23, 2024 21:07

mergify bot removed the ci-failure PR has at least one CI failure label May 23, 2024

leseb approved these changes May 30, 2024

View reviewed changes

leseb requested a review from tiran May 30, 2024 14:15

mergify bot added the one-approval PR has one approval from a maintainer label May 30, 2024

nathan-weinberg requested a review from a team June 4, 2024 14:23

russellb added the e2e-trigger label Jun 4, 2024

mergify bot removed the e2e-trigger label Jun 4, 2024

mergify bot added the needs-rebase This Pull Request needs to be rebased label Jun 5, 2024

JamesKunstle reviewed Jun 21, 2024

View reviewed changes

JamesKunstle self-requested a review June 21, 2024 15:35

JamesKunstle reviewed Jun 21, 2024

View reviewed changes

derekhiggins closed this Aug 22, 2024

		training_args["fp16"] = use_fp16
		training_args["bf16"] = not use_fp16

Search code, repositories, users, issues, pull requests...

Comments

Conversation

derekhiggins commented Apr 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tiran Apr 26, 2024

Choose a reason for hiding this comment

Uh oh!

derekhiggins Apr 29, 2024

Choose a reason for hiding this comment

Uh oh!

mergify bot commented Apr 29, 2024

Uh oh!

maxamillion commented Apr 30, 2024

Uh oh!

mergify bot commented May 6, 2024

Uh oh!

mergify bot commented May 7, 2024

Uh oh!

tyll commented May 8, 2024

Uh oh!

leseb left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

derekhiggins commented May 23, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mergify bot commented Jun 5, 2024

Uh oh!

JamesKunstle Jun 21, 2024

Choose a reason for hiding this comment

Uh oh!

JamesKunstle left a comment

Choose a reason for hiding this comment

Uh oh!

russellb commented Jun 24, 2024

Uh oh!

derekhiggins commented Aug 22, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

derekhiggins commented Apr 25, 2024 •

edited

Loading

derekhiggins commented May 23, 2024 •

edited

Loading