llama : add BERT support #2872

Closed

Closed

llama : add BERT support#2872

Labels

There is a working bert.cpp implementation.
We should try to implement this in llama.cpp and update the embedding example to use it.

The implementation should follow mostly what we did to integrate Falcon.
Here are the main steps:

Update gguf.py with BERT arch KV pairs and tensors
Python convert script using gguf.py to generate F16 model
add tokenizer implementation in llama.cpp
add function to build BERT graph
add any new ops in ggml if needed
add CUDA offloading
add tokenizer tests

Metadata

Assignees

No one assigned

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests