Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Comments

Close side panel

Allow arbitary trainging args to be overridden#1008

Closed
derekhiggins wants to merge 1 commit intoinstructlab:maininstructlab/instructlab:mainfrom
derekhiggins:override_argsderekhiggins/instructlab:override_argsCopy head branch name to clipboard
Closed

Allow arbitary trainging args to be overridden#1008
derekhiggins wants to merge 1 commit intoinstructlab:maininstructlab/instructlab:mainfrom
derekhiggins:override_argsderekhiggins/instructlab:override_argsCopy head branch name to clipboard

Conversation

@derekhiggins
Copy link
Contributor

@derekhiggins derekhiggins commented Apr 25, 2024

Adding as a hidden argument to allow experimentation on various devices. Eventually once we know whats needed we can add something more permanent.

Fixes #1007

With this PR and #1012 , running ilab e2e including training works on colab

!ilab train --device cuda --override-training-args '{"bf16":false, "gradient_checkpointing":true, "gradient_accumulation_steps":8}'

Comment on lines 243 to 244
training_args["fp16"] = use_fp16
training_args["bf16"] = not use_fp16
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bf16 issue is addressed in #993

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, but this PR isn't really intended to deal with any specific training options, the point is to allow the advanced user to override any of them without needing to make changes to the code.

@mergify mergify bot added the needs-rebase This Pull Request needs to be rebased label Apr 29, 2024
@mergify
Copy link
Contributor

mergify bot commented Apr 29, 2024

This pull request has merge conflicts that must be resolved before it can be
merged. @derekhiggins please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@github-actions github-actions bot removed the testing Relates to testing label Apr 29, 2024
@mergify mergify bot removed the needs-rebase This Pull Request needs to be rebased label Apr 29, 2024
@maxamillion
Copy link
Contributor

I tested this patch and the following worked for me using my RTX A4000 GPU with 16G of VRAM:

$ ilab train --device cuda --override-training-args '{"bf16":false, "gradient_checkpointing":true, "gradient_accumulation_steps":8}'

TY!

@mergify mergify bot added the testing Relates to testing label Apr 30, 2024
@mergify mergify bot added the needs-rebase This Pull Request needs to be rebased label May 6, 2024
@mergify
Copy link
Contributor

mergify bot commented May 6, 2024

This pull request has merge conflicts that must be resolved before it can be
merged. @derekhiggins please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added needs-rebase This Pull Request needs to be rebased and removed needs-rebase This Pull Request needs to be rebased labels May 6, 2024
@mergify
Copy link
Contributor

mergify bot commented May 7, 2024

This pull request has merge conflicts that must be resolved before it can be
merged. @derekhiggins please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@tyll
Copy link

tyll commented May 8, 2024

Due to the complexity of the data, it seems this is better suited to be added to config.yaml instead of passing it on the command line.

Copy link
Contributor

@leseb leseb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have functional test coverage for this?

src/instructlab/lab.py Show resolved Hide resolved
@mergify mergify bot added ci-failure PR has at least one CI failure and removed needs-rebase This Pull Request needs to be rebased labels May 23, 2024
@derekhiggins
Copy link
Contributor Author

derekhiggins commented May 23, 2024

Due to the complexity of the data, it seems this is better suited to be added to config.yaml instead of passing it on the command line.

I've added a example of how to use this from a json file e.g. --override-training-args "$(< override_train_args.json)"
would this be enough? As there is no training section currently in the config.yaml and I'm not sure this is a good reason to add one?

Can we have functional test coverage for this?

If this is merged I'll update the e2e tests which should cover it (e.g. #1111 )

Adding as a hidden argument to allow experimentation
on various devices. Eventually once we know whats needed
we can add something more permanent.

Fixes instructlab#1007

Signed-off-by: Derek Higgins <derekh@redhat.com>
@mergify mergify bot removed the ci-failure PR has at least one CI failure label May 23, 2024
@leseb leseb requested a review from tiran May 30, 2024 14:15
@mergify mergify bot added the one-approval PR has one approval from a maintainer label May 30, 2024
@nathan-weinberg nathan-weinberg requested a review from a team June 4, 2024 14:23
@mergify mergify bot removed the e2e-trigger label Jun 4, 2024
@mergify
Copy link
Contributor

mergify bot commented Jun 5, 2024

This pull request has merge conflicts that must be resolved before it can be
merged. @derekhiggins please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase This Pull Request needs to be rebased label Jun 5, 2024
try:
override_training_args_dict = json.loads(override_training_args)
except json.decoder.JSONDecodeError as e:
ctx.fail("Parsing override trainign args: " + str(e))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"trainign" nit on spelling.

I think the command fail (CLI exits) if the input is malformed, too, rather than proceeding and making the user ctl-c and reload.

@JamesKunstle JamesKunstle self-requested a review June 21, 2024 15:35
Copy link
Contributor

@JamesKunstle JamesKunstle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This functionality is super desirable. In the churn of designing the CLI in the context of other pillars of the project (SDG, evaluation, publishing), it's become clear that we need more wide-spread and type-checked configuration for everything. @cdoern's inbound "profiles" PR accounts for this, taking a first step toward application-wide default and override configuration support.

@derekhiggins your PR is very very appreciated, we'd love your input on @cdoern's work as well.

@russellb
Copy link
Contributor

This functionality is super desirable. In the churn of designing the CLI in the context of other pillars of the project (SDG, evaluation, publishing), it's become clear that we need more wide-spread and type-checked configuration for everything. @cdoern's inbound "profiles" PR accounts for this, taking a first step toward application-wide default and override configuration support.

@derekhiggins your PR is very very appreciated, we'd love your input on @cdoern's work as well.

@JamesKunstle can you provide a link (or links) to the work you're referring to and requesting feedback on?

@derekhiggins
Copy link
Contributor Author

Closing this, a lot has changed since it was created and its probably no longer relevant

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

needs-rebase This Pull Request needs to be rebased one-approval PR has one approval from a maintainer testing Relates to testing

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Training options only allow for well known/tested HW

7 participants

Morty Proxy This is a proxified and sanitized view of the page, visit original site.