Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Commit 522aecb

Browse filesBrowse files
committed
docs: add server config docs
1 parent 6473796 commit 522aecb
Copy full SHA for 522aecb

File tree

Expand file treeCollapse file tree

2 files changed

+102
-2
lines changed
Filter options
Expand file treeCollapse file tree

2 files changed

+102
-2
lines changed

‎docs/server.md

Copy file name to clipboardExpand all lines: docs/server.md
+95-1Lines changed: 95 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,12 @@ python3 -m llama_cpp.server --help
3232

3333
NOTE: All server options are also available as environment variables. For example, `--model` can be set by setting the `MODEL` environment variable.
3434

35+
Check out the server config reference below settings for more information on the available options.
36+
CLI arguments and environment variables are available for all of the fields defined in [`ServerSettings`](#llama_cpp.server.settings.ServerSettings) and [`ModelSettings`](#llama_cpp.server.settings.ModelSettings)
37+
38+
Additionally the server supports configuration check out the [configuration section](#configuration-and-multi-model-support) for more information and examples.
39+
40+
3541
## Guides
3642

3743
### Code Completion
@@ -121,4 +127,92 @@ response = client.chat.completions.create(
121127
],
122128
)
123129
print(response)
124-
```
130+
```
131+
132+
## Configuration and Multi-Model Support
133+
134+
The server supports configuration via a JSON config file that can be passed using the `--config_file` parameter or the `CONFIG_FILE` environment variable.
135+
136+
```bash
137+
python3 -m llama_cpp.server --config_file <config_file>
138+
```
139+
140+
Config files support all of the server and model options supported by the cli and environment variables however instead of only a single model the config file can specify multiple models.
141+
142+
The server supports routing requests to multiple models based on the `model` parameter in the request which matches against the `model_alias` in the config file.
143+
144+
At the moment only a single model is loaded into memory at, the server will automatically load and unload models as needed.
145+
146+
```json
147+
{
148+
"host": "0.0.0.0",
149+
"port": 8080,
150+
"models": [
151+
{
152+
"model": "models/OpenHermes-2.5-Mistral-7B-GGUF/openhermes-2.5-mistral-7b.Q4_K_M.gguf",
153+
"model_alias": "gpt-3.5-turbo",
154+
"chat_format": "chatml",
155+
"n_gpu_layers": -1,
156+
"offload_kqv": true,
157+
"n_threads": 12,
158+
"n_batch": 512,
159+
"n_ctx": 2048
160+
},
161+
{
162+
"model": "models/OpenHermes-2.5-Mistral-7B-GGUF/openhermes-2.5-mistral-7b.Q4_K_M.gguf",
163+
"model_alias": "gpt-4",
164+
"chat_format": "chatml",
165+
"n_gpu_layers": -1,
166+
"offload_kqv": true,
167+
"n_threads": 12,
168+
"n_batch": 512,
169+
"n_ctx": 2048
170+
},
171+
{
172+
"model": "models/ggml_llava-v1.5-7b/ggml-model-q4_k.gguf",
173+
"model_alias": "gpt-4-vision-preview",
174+
"chat_format": "llava-1-5",
175+
"clip_model_path": "models/ggml_llava-v1.5-7b/mmproj-model-f16.gguf",
176+
"n_gpu_layers": -1,
177+
"offload_kqv": true,
178+
"n_threads": 12,
179+
"n_batch": 512,
180+
"n_ctx": 2048
181+
},
182+
{
183+
"model": "models/mistral-7b-v0.1-GGUF/ggml-model-Q4_K.gguf",
184+
"model_alias": "text-davinci-003",
185+
"n_gpu_layers": -1,
186+
"offload_kqv": true,
187+
"n_threads": 12,
188+
"n_batch": 512,
189+
"n_ctx": 2048
190+
},
191+
{
192+
"model": "models/replit-code-v1_5-3b-GGUF/replit-code-v1_5-3b.Q4_0.gguf",
193+
"model_alias": "copilot-codex",
194+
"n_gpu_layers": -1,
195+
"offload_kqv": true,
196+
"n_threads": 12,
197+
"n_batch": 1024,
198+
"n_ctx": 9216
199+
}
200+
]
201+
}
202+
```
203+
204+
The config file format is defined by the [`ConfigFileSettings`](#llama_cpp.server.settings.ConfigFileSettings) class.
205+
206+
## Server Options Reference
207+
208+
::: llama_cpp.server.settings.ConfigFileSettings
209+
options:
210+
show_if_no_docstring: true
211+
212+
::: llama_cpp.server.settings.ServerSettings
213+
options:
214+
show_if_no_docstring: true
215+
216+
::: llama_cpp.server.settings.ModelSettings
217+
options:
218+
show_if_no_docstring: true

‎llama_cpp/server/settings.py

Copy file name to clipboardExpand all lines: llama_cpp/server/settings.py
+7-1Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,8 @@
1313

1414

1515
class ModelSettings(BaseSettings):
16+
"""Model settings used to load a Llama model."""
17+
1618
model: str = Field(
1719
description="The path to the model to use for generating completions."
1820
)
@@ -131,6 +133,8 @@ class ModelSettings(BaseSettings):
131133

132134

133135
class ServerSettings(BaseSettings):
136+
"""Server settings used to configure the FastAPI and Uvicorn server."""
137+
134138
# Uvicorn Settings
135139
host: str = Field(default="localhost", description="Listen address")
136140
port: int = Field(default=8000, description="Listen port")
@@ -156,6 +160,8 @@ class Settings(ServerSettings, ModelSettings):
156160

157161

158162
class ConfigFileSettings(ServerSettings):
163+
"""Configuration file format settings."""
164+
159165
models: List[ModelSettings] = Field(
160-
default=[], description="Model configs, overwrites default config"
166+
default=[], description="Model configs"
161167
)

0 commit comments

Comments
0 (0)
Morty Proxy This is a proxified and sanitized view of the page, visit original site.