Use proper backend for CLIP #1175

iamlemec · Feb 11, 2024

When using LLaVa, CLIP does not load on the default backend (i.e. CUDA/Metal when they are available). This arises because the code in clip.cpp conditions on GGML_USE_XXX rather than LLAMA_USE_XXX. The main llama.cpp CMake file enables the former when the latter is present, but we need to do it manually for LLaVA since we're bypassing the top level.

abetlen · Feb 11, 2024

@iamlemec that's great, thank you!

eisneim · Mar 13, 2024

version llama-cpp-python-0.2.56, how to enable CLIP offload to GPU? my 3090 can do 50 token/s but total time would be tooo slow(92s), much slower than my Macbook M3 max(6s),
i'v tried: CMAKE_ARGS="-DLLAMA_CUBLAS=on -DLLAVA_BUILD=on" pip install llama-cpp-python but it does not work

iamlemec · Mar 13, 2024

You should try to see whether it is in fact using the GPU. Assuming you're running something like the LLaVa example in the README, you can load the CLIP model in verbose mode with:

chat_handler = Llava15ChatHandler(clip_model_path="path/to/llava/mmproj.bin", verbose=True)

If you're using the CUDA backend, it should have the line: "clip_model_load: CLIP using CUDA backend".

But yeah, it would be surprising if a 3090 was that much slower than an M3.

eisneim · Mar 13, 2024

@iamlemec thanks!

eisneim · Mar 13, 2024

i found the issue! when setting the context_size to 4096 the generation never ends!! it will try to decode until the context is full which is why it took so long to finish, but this won't happen on M3 max, strange.....

iamlemec · Mar 13, 2024

Hmm, that's curious. Does the output from CUDA look reasonable? Also, what are you setting max_tokens to?

eisneim · Mar 14, 2024

i did not set max_tokens, i just set n_ctx=4096 on Macbook is works ok, but on cuda it will keep generatting never ends

use proper backend for clip

6f84b85

abetlen merged commit 19b55ad into abetlen:main Feb 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Use proper backend for CLIP #1175

Use proper backend for CLIP #1175

Uh oh!

iamlemec commented Feb 11, 2024

Uh oh!

abetlen commented Feb 11, 2024

Uh oh!

eisneim commented Mar 13, 2024

Uh oh!

iamlemec commented Mar 13, 2024

Uh oh!

eisneim commented Mar 13, 2024

Uh oh!

eisneim commented Mar 13, 2024

Uh oh!

iamlemec commented Mar 13, 2024

Uh oh!

eisneim commented Mar 14, 2024

Uh oh!

Uh oh!

Search code, repositories, users, issues, pull requests...

Use proper backend for CLIP #1175

Use proper backend for CLIP #1175

Uh oh!

Conversation

iamlemec commented Feb 11, 2024

Uh oh!

abetlen commented Feb 11, 2024

Uh oh!

eisneim commented Mar 13, 2024

Uh oh!

iamlemec commented Mar 13, 2024

Uh oh!

eisneim commented Mar 13, 2024

Uh oh!

eisneim commented Mar 13, 2024

Uh oh!

iamlemec commented Mar 13, 2024

Uh oh!

eisneim commented Mar 14, 2024

Uh oh!

Uh oh!