Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Commit 4d574bd

Browse filesBrowse files
authored
feat(server): Add support for pulling models from Huggingface Hub (abetlen#1222)
* Basic support for hf pull on server * Add hf_model_repo_id setting * Update README
1 parent b3e358d commit 4d574bd
Copy full SHA for 4d574bd

File tree

Expand file treeCollapse file tree

3 files changed

+24
-2
lines changed
Filter options
Expand file treeCollapse file tree

3 files changed

+24
-2
lines changed

‎README.md

Copy file name to clipboardExpand all lines: README.md
+6Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -577,6 +577,12 @@ python3 -m llama_cpp.server --model models/7B/llama-model.gguf --chat_format cha
577577
That will format the prompt according to how model expects it. You can find the prompt format in the model card.
578578
For possible options, see [llama_cpp/llama_chat_format.py](llama_cpp/llama_chat_format.py) and look for lines starting with "@register_chat_format".
579579

580+
If you have `huggingface-hub` installed, you can also use the `--hf_model_repo_id` flag to load a model from the Hugging Face Hub.
581+
582+
```bash
583+
python3 -m llama_cpp.server --hf_model_repo_id Qwen/Qwen1.5-0.5B-Chat-GGUF --model '*q8_0.gguf'
584+
```
585+
580586
### Web Server Features
581587

582588
- [Local Copilot replacement](https://llama-cpp-python.readthedocs.io/en/latest/server/#code-completion)

‎llama_cpp/server/model.py

Copy file name to clipboardExpand all lines: llama_cpp/server/model.py
+13-2Lines changed: 13 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -120,9 +120,20 @@ def load_llama_from_model_settings(settings: ModelSettings) -> llama_cpp.Llama:
120120
kv_overrides[key] = float(value)
121121
else:
122122
raise ValueError(f"Unknown value type {value_type}")
123+
124+
import functools
123125

124-
_model = llama_cpp.Llama(
125-
model_path=settings.model,
126+
kwargs = {}
127+
128+
if settings.hf_model_repo_id is not None:
129+
create_fn = functools.partial(llama_cpp.Llama.from_pretrained, repo_id=settings.hf_model_repo_id, filename=settings.model)
130+
else:
131+
create_fn = llama_cpp.Llama
132+
kwargs["model_path"] = settings.model
133+
134+
135+
_model = create_fn(
136+
**kwargs,
126137
# Model Params
127138
n_gpu_layers=settings.n_gpu_layers,
128139
main_gpu=settings.main_gpu,

‎llama_cpp/server/settings.py

Copy file name to clipboardExpand all lines: llama_cpp/server/settings.py
+5Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -143,6 +143,11 @@ class ModelSettings(BaseSettings):
143143
default=None,
144144
description="The model name or path to a pretrained HuggingFace tokenizer model. Same as you would pass to AutoTokenizer.from_pretrained().",
145145
)
146+
# Loading from HuggingFace Model Hub
147+
hf_model_repo_id: Optional[str] = Field(
148+
default=None,
149+
description="The model repo id to use for the HuggingFace tokenizer model.",
150+
)
146151
# Speculative Decoding
147152
draft_model: Optional[str] = Field(
148153
default=None,

0 commit comments

Comments
0 (0)
Morty Proxy This is a proxified and sanitized view of the page, visit original site.