Make SDG batch size configurable via system profile (backport #3157)#3207
Make SDG batch size configurable via system profile (backport #3157)#3207mergify[bot] merged 1 commit intorelease-v0.24instructlab/instructlab:release-v0.24from mergify/bp/release-v0.24/pr-3157instructlab/instructlab:mergify/bp/release-v0.24/pr-3157Copy head branch name to clipboard
Conversation
Signed-off-by: Nikhil Palaskar <npalaska@redhat.com> (cherry picked from commit 4a309ce)
|
There is a failure in the large E2E job on I'm going to run the large E2E job on this branch to ensure no conflicts on this particular release branch. |
|
E2E (NVIDIA L40S x4) workflow launched on this PR: View run |
|
e2e workflow failed on this PR: View run, please investigate. |
|
That failure was due to a HuggingFace download error. Retrying. |
|
E2E (NVIDIA L40S x4) workflow launched on this PR: View run |
|
e2e workflow failed on this PR: View run, please investigate. |
|
The same HuggingFace error occurred. It seems like a server-side error, so I'm going to retrigger the job once more. |
|
E2E (NVIDIA L40S x4) workflow launched on this PR: View run |
|
e2e workflow failed on this PR: View run, please investigate. |
|
We're tracking the HuggingFace download error here: #3215 |
|
@courtneypacheco , I've merged this change to I remember you mentioned you added the |
|
I discussed this with @courtneypacheco. She's ok with removing |
|
We've shipped this fix in https://github.com/instructlab/instructlab/releases/tag/v0.24.2 |
Currently, the batch size for SDG is only configurable via the CLI, but a single batch size across all hardware profiles is not optimal. Different hardware configurations have varying capabilities, and using a fixed batch size can lead to under-utilization or over-utilization of resources during the SDG process.
To ensure efficient performance across different hardware, we should set the batch sizes independently in each system profile.
This is an automatic backport of pull request #3157 done by Mergify.