add e2e testing for quantized backend training#1494
add e2e testing for quantized backend training#1494cdoern wants to merge 3 commits intoinstructlab:maininstructlab/instructlab:mainfrom
Conversation
4648f8c to
e0723c0
Compare
|
This pull request has merge conflicts that must be resolved before it can be |
|
It looks like e2e actually failed, but it says it passed, not sure why https://github.com/instructlab/instructlab/actions/runs/9702039128/job/26776964340?pr=1494 |
|
huh yeah I noticed that @russellb I will see if I can get it passing today |
|
This pull request has merge conflicts that must be resolved before it can be |
|
This pull request has merge conflicts that must be resolved before it can be |
adds a jsonl file for backend training so we don't need to worry about generation, uses LoRA Signed-off-by: Charlie Doern <cdoern@redhat.com>
Signed-off-by: Charlie Doern <cdoern@redhat.com>
|
switched to merlinite, lets see if that gets around ampere limitation. If not @Maxusmusti has a fix in training library to disable flash attn |
Signed-off-by: Charlie Doern <cdoern@redhat.com>
|
This pull request has merge conflicts that must be resolved before it can be |
|
wondering if we should close this in favor of just using the A10s in #1557 @russellb @Maxusmusti WDYT? Is there any chance with lora I could get this running on the smaller instances? |
|
@cdoern what GPU is being used in these instances? |
|
@cdoern yeah, focusing new training on the larger instances makes sense to me. I'm going to propose a workflow that uses 4x A10Gs. I think that would be a great place to introduce this coverage. |
|
closing in favor of #1557 which merged. If we need this version we can reopen another PR |
adds another training test which runs after the
--legacy=truetestChecklist:
conventional commits.