Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Comments

Close side panel

Auto-detect bf16 support for CUDA#993

Closed
tiran wants to merge 4 commits intoinstructlab:maininstructlab/instructlab:mainfrom
tiran:cuda_bf16tiran/instructlab:cuda_bf16Copy head branch name to clipboard
Closed

Auto-detect bf16 support for CUDA#993
tiran wants to merge 4 commits intoinstructlab:maininstructlab/instructlab:mainfrom
tiran:cuda_bf16tiran/instructlab:cuda_bf16Copy head branch name to clipboard

Conversation

@tiran
Copy link
Contributor

@tiran tiran commented Apr 25, 2024

Changes

Which issue is resolved by this Pull Request:
See #647

Description of your changes:

bf16 (bfloat16) is not available on older CUDA versions < 11.0 as well as devices with CUDA support level < 8.0. linux_train now detects and reports bf16 support. Training on CUDA falls back to fp16 (half precision float).


also closes #1006

@tiran
Copy link
Contributor Author

tiran commented May 2, 2024

@Mergifyio rebase

@mergify
Copy link
Contributor

mergify bot commented May 2, 2024

rebase

❌ Unable to rebase: user tiran is unknown.

Details

Please make sure tiran has logged in Mergify dashboard.

@tiran
Copy link
Contributor Author

tiran commented May 2, 2024

@Mergifyio rebase

@mergify
Copy link
Contributor

mergify bot commented May 2, 2024

rebase

❌ Base branch update has failed

Details

tiran does not have write access to the forked repository.

@tiran tiran force-pushed the cuda_bf16 branch 7 times, most recently from d646d59 to ecc1a38 Compare May 6, 2024 04:28
@mergify
Copy link
Contributor

mergify bot commented May 6, 2024

This pull request has merge conflicts that must be resolved before it can be
merged. @tiran please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added needs-rebase This Pull Request needs to be rebased and removed needs-rebase This Pull Request needs to be rebased labels May 6, 2024
@mergify
Copy link
Contributor

mergify bot commented May 7, 2024

This pull request has merge conflicts that must be resolved before it can be
merged. @tiran please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot removed the needs-rebase This Pull Request needs to be rebased label May 7, 2024
@tiran tiran force-pushed the cuda_bf16 branch 2 times, most recently from 8e3d568 to 7999e89 Compare May 7, 2024 16:41
@mergify mergify bot added the testing Relates to testing label May 7, 2024
@tiran tiran force-pushed the cuda_bf16 branch 2 times, most recently from 5f01310 to 0519a1d Compare May 7, 2024 18:05
@leseb
Copy link
Contributor

leseb commented May 23, 2024

@tiran what's the status on this? Thanks!

@tiran
Copy link
Contributor Author

tiran commented May 23, 2024

@leseb I have rebased the PR. Let's see if tests are now passing.

On a test system with 64 GB RAM, this memory calculation came out as
62, not 64. Check for 60 instead of 64.

Obviously this is not very scientific as we're making very rough
assumptions about what is required. It would be better to enhance the
code further to actually calculate a memory requirement based on the
model instead just hard coding a rough guess.

Signed-off-by: Russell Bryant <rbryant@redhat.com>
@mergify mergify bot added the ci-failure PR has at least one CI failure label Jun 5, 2024
@russellb
Copy link
Contributor

russellb commented Jun 5, 2024

I spoke with @leseb on Slack and we determine that the memory check came out to 62 on his 64 GB system, so I've changed the rough check in the code to now be < 60 instead of < 64. I'd like to see if that now gets him a boost, as his system should work with dtype=None (using float32).

@leseb
Copy link
Contributor

leseb commented Jun 6, 2024

Here are the results I've been waiting for :), the same system as commented in #993 (comment):

Previously it took 1h28min to barely reach 29% of the training, now the whole training took 1h19min:

LINUX_TRAIN.PY: TRAINING
{'train_runtime': 79.7499, 'train_samples_per_second': 0.075, 'train_steps_per_second': 0.075, 'train_loss': 1.6997551918029785, 'epoch': 1.0}
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 6/6 [01:19<00:00, 13.29s/it]

torch_dtype = "auto" if device.type == "cuda" else None
if device.type == "cpu":
total_memory = psutil.virtual_memory().total / (1024**3)
if total_memory < 60:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if total_memory < 60:
if total_memory < 62:

A system with 64GB of RAM, will report:

>>> import psutil
>>> mem = psutil.virtual_memory()
>>> mem
svmem(total=67228049408, available=31099351040, percent=53.7, used=35383861248, free=468701184, active=27983499264, inactive=37159084032, buffers=1079336960, cached=30296150016, shared=2109440, slab=1340628992)

And we have. 67228049408 Bytes converted to GiB gives us 67228049408 / 1024 ** 3 gives us 62.6 GiB

# There's more going on here and needs deeper exploration to find
# the right parameters to be checking for choosing the best
# configuration.
# Anecdotally, 64 GB seems to be enough, but this calculation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A system with 64GB of RAM will report ~62.6 GiB so we base our calculation on 62.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it's such a rough guess, 60 still seems fine? We need to actually do some math at some point ...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll share my math in a few :) stay tuned!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Screenshot 2024-06-07 at 14 23 04

Some more numbers:

  • The training part take ~30GB of RAM to process, there is a very small chance that this could work on very minimal Linux installation, by minimal I mean, only system critical services run and nothing else.
  • The inference part takes ~35GB of RAM

Essentially a system with 48GB of RAM should be able to run both training and inferencing. Although 48 GB of RAM is not very common.

@tiran tiran marked this pull request as draft July 9, 2024 08:41
@mergify mergify bot added the needs-rebase This Pull Request needs to be rebased label Jul 9, 2024
@mergify
Copy link
Contributor

mergify bot commented Jul 9, 2024

This pull request has merge conflicts that must be resolved before it can be
merged. @tiran please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@github-actions
Copy link

github-actions bot commented Oct 8, 2024

This pull request has been automatically marked as stale because it has not had activity within 90 days. It will be automatically closed if no further activity occurs within 30 days.

@github-actions github-actions bot added the stale label Oct 8, 2024
@mergify mergify bot added the dependencies Relates to dependencies label Oct 8, 2024
@github-actions github-actions bot removed the stale label Oct 9, 2024
@mergify mergify bot removed the needs-rebase This Pull Request needs to be rebased label Jan 6, 2025
@mergify
Copy link
Contributor

mergify bot commented Jan 6, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. @tiran please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase This Pull Request needs to be rebased label Jan 6, 2025
@mergify mergify bot removed the needs-rebase This Pull Request needs to be rebased label Feb 16, 2025
@mergify
Copy link
Contributor

mergify bot commented Feb 16, 2025

This pull request has merge conflicts that must be resolved before it can be merged. @tiran please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase This Pull Request needs to be rebased label Feb 16, 2025
@courtneypacheco
Copy link
Contributor

Hi @tiran! Are you still working on this PR? We're looking to do some housekeeping and close out stale PRs, including drafts.

If we don't hear back within 7 days, we will close this PR, but please know that you are more than welcome to reopen it if you'd like! Thank you!

@mergify mergify bot removed the needs-rebase This Pull Request needs to be rebased label Mar 27, 2025
@mergify
Copy link
Contributor

mergify bot commented Mar 27, 2025

This pull request has merge conflicts that must be resolved before it can be merged. @tiran please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase This Pull Request needs to be rebased label Mar 27, 2025
@mergify mergify bot removed the needs-rebase This Pull Request needs to be rebased label Apr 28, 2025
@mergify
Copy link
Contributor

mergify bot commented Apr 28, 2025

This pull request has merge conflicts that must be resolved before it can be merged. @tiran please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase This Pull Request needs to be rebased label Apr 28, 2025
@github-actions
Copy link

This pull request has been automatically marked as stale because it has not had activity within 60 days. It will be automatically closed if no further activity occurs within 30 days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci-failure PR has at least one CI failure dependencies Relates to dependencies needs-rebase This Pull Request needs to be rebased stale testing Relates to testing

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Optimize CPU training on Linux

7 participants

Morty Proxy This is a proxified and sanitized view of the page, visit original site.