Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

CLI flags/configuration and examples for Linux GPU training #647

Copy link
Copy link
@bbrowning

Description

@bbrowning
Issue body actions

The exact settings that need to change to successfully train with a Linux GPU can vary quite a lot by system. AMD cards vs Nvidia cards, memory available on the card, age of card, etc.

A list of the settings I've found so far that a user may want to tweak to get training working at all or to make tradeoffs in overall training speed versus resources used, nonexhaustive:

  • device(s) to use
  • fp16 vs bf16 precision
  • quantitization, specifically using 4bit BitsAndBytes vs not
  • gradient accumulation steps or disabled entirely
  • gradient checkpointing enabled/disabled
  • per-device training batch size
  • distributed training across multiple GPUs/CPUs (may be out of scope for just config, as that's more work to setup)

Some of these are configurable by CLI flags today. Can we expose all the needed parameters via CLI flags? Do we need configuration files? Which of these are also applicable to other lab commands, such as serve, generate, test, convert?

However we expose the necessary configuration, a list of example configuration/flags to use for different setups would be nice. Show people the things to tweak to lower memory usage at the expense of speed. Perhaps some guidance on the options needed to reduce GPU memory required under popular thresholds, like 8GB, 16GB, 24GB.

My assumption here is a goal would be to give people enough knobs to turn that they can get training going on their machine without having to change the actual Python code to do it. Perhaps others disagree with that assumption? All opinions are welcome!

Reactions are currently unavailable

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestNew feature or requestlinuxSomething Linux-specificSomething Linux-specificstale

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      Morty Proxy This is a proxified and sanitized view of the page, visit original site.