Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Comments

Close side panel

add e2e testing for quantized backend training#1494

Closed
cdoern wants to merge 3 commits intoinstructlab:maininstructlab/instructlab:mainfrom
cdoern:trainingcdoern/instructlab:trainingCopy head branch name to clipboard
Closed

add e2e testing for quantized backend training#1494
cdoern wants to merge 3 commits intoinstructlab:maininstructlab/instructlab:mainfrom
cdoern:trainingcdoern/instructlab:trainingCopy head branch name to clipboard

Conversation

@cdoern
Copy link
Contributor

@cdoern cdoern commented Jun 27, 2024

adds another training test which runs after the --legacy=true test

Checklist:

  • Commit Message Formatting: Commit titles and messages follow guidelines in the
    conventional commits.
  • Changelog updated with breaking and/or notable changes for the next minor release.
  • Documentation has been updated, if necessary.
  • Unit tests have been added, if necessary.
  • Integration tests have been added, if necessary.

@mergify mergify bot added the ci-failure PR has at least one CI failure label Jun 27, 2024
@mergify mergify bot removed the ci-failure PR has at least one CI failure label Jun 27, 2024
@mergify mergify bot added the ci-failure PR has at least one CI failure label Jun 27, 2024
@mergify mergify bot removed the ci-failure PR has at least one CI failure label Jun 27, 2024
@cdoern cdoern force-pushed the training branch 2 times, most recently from 4648f8c to e0723c0 Compare June 27, 2024 17:48
@ktam3 ktam3 linked an issue Jun 27, 2024 that may be closed by this pull request
@mergify mergify bot added the needs-rebase This Pull Request needs to be rebased label Jun 28, 2024
@mergify
Copy link
Contributor

mergify bot commented Jun 28, 2024

This pull request has merge conflicts that must be resolved before it can be
merged. @cdoern please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@russellb
Copy link
Contributor

It looks like e2e actually failed, but it says it passed, not sure why

https://github.com/instructlab/instructlab/actions/runs/9702039128/job/26776964340?pr=1494

@cdoern
Copy link
Contributor Author

cdoern commented Jun 29, 2024

huh yeah I noticed that @russellb I will see if I can get it passing today

@mergify mergify bot added ci-failure PR has at least one CI failure and removed needs-rebase This Pull Request needs to be rebased labels Jun 29, 2024
@russellb russellb linked an issue Jun 29, 2024 that may be closed by this pull request
@mergify
Copy link
Contributor

mergify bot commented Jun 30, 2024

This pull request has merge conflicts that must be resolved before it can be
merged. @cdoern please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase This Pull Request needs to be rebased label Jun 30, 2024
@mergify mergify bot removed the ci-failure PR has at least one CI failure label Jun 30, 2024
@mergify mergify bot added ci-failure PR has at least one CI failure and removed needs-rebase This Pull Request needs to be rebased ci-failure PR has at least one CI failure labels Jun 30, 2024
@mergify mergify bot removed the ci-failure PR has at least one CI failure label Jun 30, 2024
@nathan-weinberg nathan-weinberg added the testing Relates to testing label Jul 1, 2024
@mergify
Copy link
Contributor

mergify bot commented Jul 1, 2024

This pull request has merge conflicts that must be resolved before it can be
merged. @cdoern please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

cdoern added 2 commits July 1, 2024 17:14
adds a jsonl file for backend training so we don't need to worry about generation, uses LoRA

Signed-off-by: Charlie Doern <cdoern@redhat.com>
Signed-off-by: Charlie Doern <cdoern@redhat.com>
@cdoern
Copy link
Contributor Author

cdoern commented Jul 1, 2024

switched to merlinite, lets see if that gets around ampere limitation. If not @Maxusmusti has a fix in training library to disable flash attn

@mergify mergify bot added ci-failure PR has at least one CI failure and removed needs-rebase This Pull Request needs to be rebased labels Jul 1, 2024
@mergify mergify bot added ci-failure PR has at least one CI failure and removed ci-failure PR has at least one CI failure labels Jul 2, 2024
@mergify mergify bot added ci-failure PR has at least one CI failure and removed ci-failure PR has at least one CI failure labels Jul 2, 2024
Signed-off-by: Charlie Doern <cdoern@redhat.com>
@mergify mergify bot added needs-rebase This Pull Request needs to be rebased and removed ci-failure PR has at least one CI failure labels Jul 2, 2024
@mergify
Copy link
Contributor

mergify bot commented Jul 2, 2024

This pull request has merge conflicts that must be resolved before it can be
merged. @cdoern please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@cdoern
Copy link
Contributor Author

cdoern commented Jul 8, 2024

wondering if we should close this in favor of just using the A10s in #1557 @russellb @Maxusmusti WDYT? Is there any chance with lora I could get this running on the smaller instances?

@Maxusmusti
Copy link
Contributor

@cdoern what GPU is being used in these instances?

@russellb
Copy link
Contributor

russellb commented Jul 8, 2024

@cdoern yeah, focusing new training on the larger instances makes sense to me.

I'm going to propose a workflow that uses 4x A10Gs. I think that would be a great place to introduce this coverage.

@cdoern
Copy link
Contributor Author

cdoern commented Jul 12, 2024

closing in favor of #1557 which merged. If we need this version we can reopen another PR

@cdoern cdoern closed this Jul 12, 2024
@ktam3 ktam3 added this to the 0.18.0 milestone Jul 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

needs-rebase This Pull Request needs to be rebased testing Relates to testing

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Epic] RHEL AI backend commands Use new training code in e2e CI job

5 participants

Morty Proxy This is a proxified and sanitized view of the page, visit original site.