Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Commit 2cc6c9a

Browse filesBrowse files
committed
docs: Update README, add FAQ
1 parent 7f3704b commit 2cc6c9a
Copy full SHA for 2cc6c9a

File tree

Expand file treeCollapse file tree

1 file changed

+30
-7
lines changed
Filter options
Expand file treeCollapse file tree

1 file changed

+30
-7
lines changed

‎README.md

Copy file name to clipboardExpand all lines: README.md
+30-7Lines changed: 30 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
# 🦙 Python Bindings for [`llama.cpp`](https://github.com/ggerganov/llama.cpp)
2+
---
23

34
[![Documentation Status](https://readthedocs.org/projects/llama-cpp-python/badge/?version=latest)](https://llama-cpp-python.readthedocs.io/en/latest/?badge=latest)
45
[![Tests](https://github.com/abetlen/llama-cpp-python/actions/workflows/test.yaml/badge.svg?branch=main)](https://github.com/abetlen/llama-cpp-python/actions/workflows/test.yaml)
@@ -23,7 +24,8 @@ Documentation is available at [https://llama-cpp-python.readthedocs.io/en/latest
2324

2425

2526

26-
## Installation from PyPI
27+
## Installation
28+
---
2729

2830
Install from PyPI (requires a c compiler):
2931

@@ -107,6 +109,7 @@ See the above instructions and set `CMAKE_ARGS` to the BLAS backend you want to
107109
Detailed MacOS Metal GPU install documentation is available at [docs/install/macos.md](https://llama-cpp-python.readthedocs.io/en/latest/install/macos/)
108110

109111
## High-level API
112+
---
110113

111114
[API Reference](https://llama-cpp-python.readthedocs.io/en/latest/api-reference/#high-level-api)
112115

@@ -269,7 +272,8 @@ llm = Llama(model_path="./models/7B/llama-model.gguf", n_ctx=2048)
269272
```
270273

271274

272-
## Web Server
275+
## OpenAI Compatible Web Server
276+
---
273277

274278
`llama-cpp-python` offers a web server which aims to act as a drop-in replacement for the OpenAI API.
275279
This allows you to use llama.cpp compatible models with any OpenAI compatible client (language libraries, services, etc).
@@ -302,13 +306,14 @@ python3 -m llama_cpp.server --model models/7B/llama-model.gguf --chat_format cha
302306
That will format the prompt according to how model expects it. You can find the prompt format in the model card.
303307
For possible options, see [llama_cpp/llama_chat_format.py](llama_cpp/llama_chat_format.py) and look for lines starting with "@register_chat_format".
304308

305-
### Web Server Examples
309+
### Web Server Features
306310

307311
- [Local Copilot replacement](https://llama-cpp-python.readthedocs.io/en/latest/server/#code-completion)
308312
- [Function Calling support](https://llama-cpp-python.readthedocs.io/en/latest/server/#function-calling)
309313
- [Vision API support](https://llama-cpp-python.readthedocs.io/en/latest/server/#multimodal-models)
310314

311315
## Docker image
316+
---
312317

313318
A Docker image is available on [GHCR](https://ghcr.io/abetlen/llama-cpp-python). To run the server:
314319

@@ -318,6 +323,7 @@ docker run --rm -it -p 8000:8000 -v /path/to/models:/models -e MODEL=/models/lla
318323
[Docker on termux (requires root)](https://gist.github.com/FreddieOliveira/efe850df7ff3951cb62d74bd770dce27) is currently the only known way to run this on phones, see [termux support issue](https://github.com/abetlen/llama-cpp-python/issues/389)
319324

320325
## Low-level API
326+
---
321327

322328
[API Reference](https://llama-cpp-python.readthedocs.io/en/latest/api-reference/#low-level-api)
323329

@@ -344,12 +350,14 @@ Below is a short example demonstrating how to use the low-level API to tokenize
344350
Check out the [examples folder](examples/low_level_api) for more examples of using the low-level API.
345351

346352

347-
# Documentation
353+
## Documentation
354+
---
348355

349356
Documentation is available via [https://llama-cpp-python.readthedocs.io/](https://llama-cpp-python.readthedocs.io/).
350357
If you find any issues with the documentation, please open an issue or submit a PR.
351358

352-
# Development
359+
## Development
360+
---
353361

354362
This package is under active development and I welcome any contributions.
355363

@@ -375,7 +383,21 @@ pip install -e .[all]
375383
make clean
376384
```
377385

378-
# How does this compare to other Python bindings of `llama.cpp`?
386+
## FAQ
387+
---
388+
389+
### Are there pre-built binaries / binary wheels available?
390+
391+
The recommended installation method is to install from source as described above.
392+
The reason for this is that `llama.cpp` is built with compiler optimizations that are specific to your system.
393+
Using pre-built binaries would require disabling these optimizations or supporting a large number of pre-built binaries for each platform.
394+
395+
That being said there are some pre-built binaries available through the Releases as well as some community provided wheels.
396+
397+
In the future, I would like to provide pre-built binaries and wheels for common platforms and I'm happy to accept any useful contributions in this area.
398+
This is currently being tracked in #741
399+
400+
### How does this compare to other Python bindings of `llama.cpp`?
379401

380402
I originally wrote this package for my own use with two goals in mind:
381403

@@ -384,6 +406,7 @@ I originally wrote this package for my own use with two goals in mind:
384406

385407
Any contributions and changes to this package will be made with these goals in mind.
386408

387-
# License
409+
## License
410+
---
388411

389412
This project is licensed under the terms of the MIT license.

0 commit comments

Comments
0 (0)
Morty Proxy This is a proxified and sanitized view of the page, visit original site.