2.2.2 Backend: llama.cpp

llama.cpp

Handle: llamacpp
URL: http://localhost:33831

llama

LLM inference in C/C++. Allows to bypass Ollama release cycle when needed - to get access to the latest models or features.

Starting

llamacpp docker image is quite large due to dependency on CUDA and other libraries. You might want to pull it ahead of time.

# [Optional] Pull the llamacpp
# images ahead of starting the service
harbor pull llamacpp

# Start the  llama.cpp service
harbor up llamacpp

# Tail service logs
harbor logs llamacpp

# Open llamacpp Web UI
harbor open llamacpp

Harbor will automatically allocate GPU resources to the container if available, see Capabilities Detection.
llamacpp will be connected to aider, anythingllm, boost, chatui, cmdh, opint, optillm, plandex, traefik, webui services when they are running together.

Models

You can find GGUF models to run on Huggingface here. After you find a model you want to run, grab the URL from the browser address bar and pass it to the harbor config

# Quick lookup for the models
harbof hf find gguf

# 1. With llama.cpp own cache:
#
# - Set the model to run, will be downloaded when llamacpp starts
#   Accepts a full URL to the GGUF file (from Browser address bar)
harbor llamacpp model https://huggingface.co/user/repo/file.gguf
# TIP: You can monitor the download progress with a one-liner below
# Replace "<file>" with the unique portion from the "file.gguf" URL above
du -h $(harbor find .gguf | grep <file>)


# 2. Shared HuggingFace Hub cache, single file:
#
# - Locate the GGUF to download, for example:
#   https://huggingface.co/bartowski/Meta-Llama-3.1-70B-Instruct-GGUF/blob/main/Meta-Llama-3.1-70B-Instruct-Q4_K_M.gguf
# - Download a single file: <user/repo> <file.gguf>
harbor hf download bartowski/Meta-Llama-3.1-70B-Instruct-GGUF Meta-Llama-3.1-70B-Instruct-Q4_K_M.gguf
# - Locate the file in the cache
harbor find Meta-Llama-3.1-70B-Instruct-Q4_K_M.gguf
# - Set the GGUF to llama.cpp
#   "/app/models/hub" is where the HuggingFace cache is mounted in the container
harbor llamacpp gguf /app/models/hub/models--bartowski--Meta-Llama-3.1-70B-Instruct-GGUF/snapshots/83fb6e83d0a8aada42d499259bc929d922e9a558/Meta-Llama-3.1-70B-Instruct-Q4_K_M.gguf


# 3. Shared HuggingFace Hub cache, whole repo:
#
# - Locate and download the repo in its entirety
harbor hf download av-codes/Trinity-2-Codestral-22B-Q4_K_M-GGUF
# - Find the files from the repo
harbor find Trinity-2-Codestral-22B-Q4_K_M-GGUF
# - Set the GGUF to llama.cpp
#   "/app/models/hub" is where the HuggingFace cache is mounted in the container
harbor llamacpp gguf /app/models/hub/models--av-codes--Trinity-2-Codestral-22B-Q4_K_M-GGUF/snapshots/c0a1f7283809423d193025e92eec6f287425ed59/trinity-2-codestral-22b-q4_k_m.gguf

Note

Please, note that this procedure doesn't download the model. If model is not found in the cache, it will be downloaded on the next start of llamacpp service.

Downloaded models are stored in the global llama.cpp cache on your local machine (same as native version uses). The server can only run one model at a time and must be restarted to switch models.

Configuration

You can provide additional arguments to the llama.cpp CLI via the LLAMACPP_EXTRA_ARGS. It can be set either with Harbor CLI or in the .env file.

# See llama.cpp server args
harbor run llamacpp --server --help

# Set the extra arguments
harbor llamacpp args '--max-tokens 1024 -ngl 100'

# Edit the .env file
HARBOR_LLAMACPP_EXTRA_ARGS="--max-tokens 1024 -ngl 100"

You can add llamacpp to default services in Harbor:

# Add llamacpp to the default services
# Will always start when running `harbor up`
harbor defaults add llamacpp

# Remove llamacpp from the default services
harbor defaults rm llamacpp

Following options are available via harbor config:

# Location of the llama.cpp own cache, either global
# or relative to $(harbor home)
LLAMACPP_CACHE                 ~/.cache/llama.cpp

# The port on the host machine where the llama.cpp service
# will be available
LLAMACPP_HOST_PORT             33831

Environment Variables

Follow Harbor's environment variables guide to set arbitrary variables for llamacpp service.

`llama.cpp` CLIs and scripts

llama.cpp comes with a lot of helper tools/CLIs, which all can be accessed via the harbor exec llamacpp command (once the service is running).

# Show the list of available llama.cpp CLIs
harbor exec llamacpp ls

# See the help for one of the CLIs
harbor exec llamacpp ./scripts/llama-bench --help

Home | CLI Reference | Services | Adding New Service | Compatibility

Search code, repositories, users, issues, pull requests...

Uh oh!

2.2.2 Backend: llama.cpp

llama.cpp

Starting

Models

Configuration

Environment Variables

llama.cpp CLIs and scripts

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

`llama.cpp` CLIs and scripts