Add server support for hf pull #1222

abetlen · Feb 26, 2024

Overview

Adds ability to pull model from huggingface model hub to the server.

Usage

CLI

python3 -m llama_cpp.server --hf_model_repo_id Qwen/Qwen1.5-0.5B-Chat-GGUF --model '*q8_0.gguf'

Config File

{
    "host": "0.0.0.0",
    "models": [
        {
            "model": "qwen1_5-0_5b-chat-q8_0.gguf",
            "hf_model_repo_id": "Qwen/Qwen1.5-0.5B-Chat-GGUF"
        }
    ]
}

Current Limitations

If using multiple models in a config file, models will not be pulled until they are requested, this may cause long requests or timeouts.

abetlen added 4 commits February 24, 2024 04:18

Basic support for hf pull on server

9ec57f7

Merge branch 'main' into add-server-support-for-hf-pull

1ed00e3

Merge branch 'main' into add-server-support-for-hf-pull

f866d5a

Add hf_model_repo_id setting

49eedd2

abetlen marked this pull request as ready for review February 26, 2024 19:32

Update README

aef3c86

abetlen merged commit 4d574bd into main Feb 26, 2024

abetlen deleted the add-server-support-for-hf-pull branch February 26, 2024 19:42

dtrifiro mentioned this pull request Mar 1, 2024

Add example pulling models from hf dtrifiro/llama-cpp-python-serving#12

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add server support for hf pull #1222

Add server support for hf pull #1222

Uh oh!

abetlen commented Feb 26, 2024 •

edited

Loading

Uh oh!

Uh oh!

Search code, repositories, users, issues, pull requests...

Add server support for hf pull #1222

Add server support for hf pull #1222

Uh oh!

Conversation

abetlen commented Feb 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Usage

Current Limitations

Uh oh!

Uh oh!

abetlen commented Feb 26, 2024 •

edited

Loading