Add support for llama.cpp's --tensor-split parameter #460

shouyiwang · Jul 9, 2023

The current llama-cpp-python does not include support for --tensor-split parameter. When running a large model across two GPUs, it currently loads the model by default in a half-by-half manner. However, this approach presents certain issues. For example, when a user has two GPUs with different VRAM sizes, it can lead to OOM. Implementing the --tensor-split parameter will address this problem by empowering users to define the proportion of the model distributed across multiple GPUs.

I'm uncertain if importing ctypes into llama.py is the most appropriate approach. However, I'm currently unsure of an alternative solution in llama_cpp.py. I would greatly appreciate any advice or suggestions regarding this matter.

Tested thouroughly with text-generation-webui. I'll sumbit a PR there after this PR get merged. Thx!

shouyiwang · Jul 14, 2023

Hi @abetlen ,
I just wanted to kindly draw your attention to this PR that I submitted 5 days ago. It would be great if you could review it when you have some time. I am available to make any necessary changes or answer any questions you might have.

Thank you for your time and consideration.

abetlen · Jul 14, 2023

@shouyiwang thank you for the contribution, lgtm

shouyiwang · Jul 15, 2023

@abetlen Thank you so much!!

Shouyi Wang added 2 commits July 9, 2023 23:00

Add tensor split

9f21f54

Resolve merge conflicts

579f526

abetlen merged commit 82b11c8 into abetlen:main Jul 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add support for llama.cpp's --tensor-split parameter #460

Add support for llama.cpp's --tensor-split parameter #460

Uh oh!

shouyiwang commented Jul 9, 2023 •

edited

Loading

Uh oh!

shouyiwang commented Jul 14, 2023

Uh oh!

abetlen commented Jul 14, 2023

Uh oh!

shouyiwang commented Jul 15, 2023

Uh oh!

Uh oh!

Search code, repositories, users, issues, pull requests...

Add support for llama.cpp's --tensor-split parameter #460

Add support for llama.cpp's --tensor-split parameter #460

Uh oh!

Conversation

shouyiwang commented Jul 9, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

shouyiwang commented Jul 14, 2023

Uh oh!

abetlen commented Jul 14, 2023

Uh oh!

shouyiwang commented Jul 15, 2023

Uh oh!

Uh oh!

shouyiwang commented Jul 9, 2023 •

edited

Loading