Support for multi-modal models

I see LLama.cpp is working on multi-modal models like LLaVA:
ggml-org/llama.cpp#3436

Model is here:

2ab9be51b7dc737136b38093316a4d3577d1fb96281f1589adac7841f5b81c43  ../models/ggml-model-q5_k.gguf
b7c8ff0f58fca47d28ba92c4443adf8653f3349282cb8d9e6911f22d9b3814fe  ../models/mmproj-model-f16.gguf

Testing:

$ mkdir build && cd build && cmake ..
$ cmake --build .
$ ./bin/llava -m ../models/ggml-model-q5_k.gguf --mmproj ../models/mmproj-model-f16.gguf --image ~/Desktop/Papers/figure-3-1.jpg

Appears to add some new params:

--mmproj MMPROJ_FILE  path to a multimodal projector file for LLaVA. see examples/llava/README.md
--image IMAGE_FILE    path to an image file. use with multimodal models

It would be awesome if we can support in llama-cpp-python.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support for multi-modal models #813

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Search code, repositories, users, issues, pull requests...

Support for multi-modal models #813

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions