Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Commit 71adef4

Browse filesBrowse files
committed
Add server docs
1 parent 66dda36 commit 71adef4
Copy full SHA for 71adef4

File tree

Expand file treeCollapse file tree

1 file changed

+77
-0
lines changed
Filter options
Expand file treeCollapse file tree

1 file changed

+77
-0
lines changed

‎docs/server.md

Copy file name to clipboard
+77Lines changed: 77 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,77 @@
1+
# OpenAI Compatible Server
2+
3+
`llama-cpp-python` offers an OpenAI API compatible web server.
4+
5+
This web server can be used to serve local models and easily connect them to existing clients.
6+
7+
## Setup
8+
9+
### Installation
10+
11+
The server can be installed by running the following command:
12+
13+
```bash
14+
pip install llama-cpp-python[server]
15+
```
16+
17+
### Running the server
18+
19+
The server can then be started by running the following command:
20+
21+
```bash
22+
python3 -m llama_cpp.server --model <model_path>
23+
```
24+
25+
### Server options
26+
27+
For a full list of options, run:
28+
29+
```bash
30+
python3 -m llama_cpp.server --help
31+
```
32+
33+
NOTE: All server options are also available as environment variables. For example, `--model` can be set by setting the `MODEL` environment variable.
34+
35+
## Guides
36+
37+
### Multi-modal Models
38+
39+
`llama-cpp-python` supports the llava1.5 family of multi-modal models which allow the language model to
40+
read information from both text and images.
41+
42+
You'll first need to download one of the available multi-modal models in GGUF format:
43+
44+
- [llava1.5 7b](https://huggingface.co/mys/ggml_llava-v1.5-7b)
45+
- [llava1.5 13b](https://huggingface.co/mys/ggml_llava-v1.5-13b)
46+
47+
Then when you run the server you'll need to also specify the path to the clip model used for image embedding
48+
49+
```bash
50+
python3 -m llama_cpp.server --model <model_path> --clip-model-path <clip_model_path>
51+
```
52+
53+
Then you can just use the OpenAI API as normal
54+
55+
```python3
56+
from openai import OpenAI
57+
58+
client = OpenAI(base_url="http://<host>:<port>/v1", api_key="sk-xxx")
59+
response = client.chat.completions.create(
60+
model="gpt-4-vision-preview",
61+
messages=[
62+
{
63+
"role": "user",
64+
"content": [
65+
{
66+
"type": "image_url",
67+
"image_url": {
68+
"url": "<image_url>"
69+
},
70+
},
71+
{"type": "text", "text": "What does the image say"},
72+
],
73+
}
74+
],
75+
)
76+
print(response)
77+
```

0 commit comments

Comments
0 (0)
Morty Proxy This is a proxified and sanitized view of the page, visit original site.