#1006 - Optimize CPU training on Linux by jsight · Pull Request #1010 · instructlab/instructlab

jsight · Apr 25, 2024

Timeout for chat mode increased to 300 seconds. Without this, it fails in a nearly silent manner in 30 seconds. This causes issues for CPU chat
Return the filename when converting the final resulting file and use this for copying. This is important when using f32, as llamam_to_gguf will have a different filename from expected (f16).
Determine the system memory available and use this to determine the batch size in cpu mode. This enables more parallelization on smaller systems.
Disable fp16 and bf16 in cpu mode. This also enables paralleization. Without it, training is unusable on my i7 with 64GB ram. With it, training takes <30 minutes for one epoch of ~26 iterations.
Turn off auto for dtype when loading the model. This greatly improved inference performance on my i7 as well. It went from single threaded to fully utilizing the CPU.
Updated the logging to be 1 based instead of zero based :)

Changes

Which issue is resolved by this Pull Request:
Resolves #

Description of your changes:

- Timeout for chat mode increased to 300 seconds. Without this, it fails in a nearly silent manner in 30 seconds. This causes issues for CPU chat - Return the filename when converting the final resulting file and use this for copying. This is important when using f32, as llamam_to_gguf will have a different filename from expected (f16). - Determine the system memory available and use this to determine the batch size in cpu mode. This enables more parallelization on smaller systems. - Disable fp16 and bf16 in cpu mode. This also enables paralleization. Without it, training is unusable on my i7 with 64GB ram. With it, training takes <30 minutes for one epoch of ~26 iterations. - Turn off auto for dtype when loading the model. This greatly improved inference performance on my i7 as well. It went from single threaded to fully utilizing the CPU. - Updated the logging to be 1 based instead of zero based :) Signed-off-by: Jess Sightler <jesse.sightler@gmail.com>

jsight · Apr 25, 2024

Resolves: #1006

bbrowning · Apr 25, 2024

src/instructlab/train/linux_train.py

+        # CPU performance is very bad with fp16 or bf16, so disable both
+        use_fp16 = use_bf16 = False


The best option to use likely depends on specific CPU used here. Xeon CPUs with AVX512 and ipex may actually get better performance leaving bf16 enabled. For the sake of better CPU defaults, this is probably fine. But this is another example where the work @derekhiggins is doing around exposing all of these training arguments would be useful so that users could ultimately tweak these for their specific system until or unless we get smart enough to make optimal decisions in most cases.

I'll test this to compare on my HW but it will be early next week before I get a chance

Here is the PR I've been testing to allow arbitrary training options
#1008

Agreed. IMO, it'd be nice to have something like "profiles" and have it automatically select a reasonable default if none is specified.

I have been facing a similar problem for Intel Gaudi (Habana Labs HPUs and SynpseAI) and created an abstract layer for Torch device properties. I just ripped the code out of my experimental branch, cleaned it up, and published it as draft PR #1015.

tiran · Apr 26, 2024

src/instructlab/train/linux_train.py

 # SPDX-License-Identifier: Apache-2.0

 # Standard
+import psutil # to determine system memory


It's a 3rd party import.

Please add the requirement to requirements.txt, too. Right now it's a transient dependency from peft.

Thanks, I have added it to requirements.txt

Signed-off-by: Jess Sightler <jesse.sightler@gmail.com>

mergify · May 2, 2024

This pull request has merge conflicts that must be resolved before it can be
merged. @jsight please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

jsight · May 3, 2024

Closing... will followup in 1109

jsight · May 3, 2024

#1109

bbrowning reviewed Apr 25, 2024

View reviewed changes

tiran reviewed Apr 26, 2024

View reviewed changes

Fixed missing requirements.txt

a066d04

Signed-off-by: Jess Sightler <jesse.sightler@gmail.com>

mergify bot added the needs-rebase This Pull Request needs to be rebased label May 2, 2024

jsight closed this May 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

#1006 - Optimize CPU training on Linux#1010

#1006 - Optimize CPU training on Linux#1010
jsight wants to merge 2 commits intoinstructlab:maininstructlab/instructlab:mainfrom
jsight:issue_1006_cpu_optimizationjsight/instructlab:issue_1006_cpu_optimizationCopy head branch name to clipboard

jsight commented Apr 25, 2024

Uh oh!

jsight commented Apr 25, 2024

Uh oh!

bbrowning Apr 25, 2024

Uh oh!

derekhiggins Apr 26, 2024

Uh oh!

jsight Apr 26, 2024

Uh oh!

tiran Apr 26, 2024

Uh oh!

tiran Apr 26, 2024

Uh oh!

jsight Apr 30, 2024

Uh oh!

mergify bot commented May 2, 2024

Uh oh!

jsight commented May 3, 2024

Uh oh!

jsight commented May 3, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		# CPU performance is very bad with fp16 or bf16, so disable both
		use_fp16 = use_bf16 = False

Search code, repositories, users, issues, pull requests...

Conversation

jsight commented Apr 25, 2024

Changes

Uh oh!

jsight commented Apr 25, 2024

Uh oh!

bbrowning Apr 25, 2024

Choose a reason for hiding this comment

Uh oh!

derekhiggins Apr 26, 2024

Choose a reason for hiding this comment

Uh oh!

jsight Apr 26, 2024

Choose a reason for hiding this comment

Uh oh!

tiran Apr 26, 2024

Choose a reason for hiding this comment

Uh oh!

tiran Apr 26, 2024

Choose a reason for hiding this comment

Uh oh!

jsight Apr 30, 2024

Choose a reason for hiding this comment

Uh oh!

mergify bot commented May 2, 2024

Uh oh!

jsight commented May 3, 2024

Uh oh!

jsight commented May 3, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants