Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

2.2.2 Backend: llama.cpp

av edited this page Nov 22, 2025 · 8 revisions

Handle: llamacpp
URL: http://localhost:33831

llama

License: MIT Server Conan Center

LLM inference in C/C++. Allows to bypass Ollama release cycle when needed - to get access to the latest models or features.

Starting

llamacpp docker image is quite large due to dependency on CUDA and other libraries. You might want to pull it ahead of time.

# [Optional] Pull the llamacpp
# images ahead of starting the service
harbor pull llamacpp

# Start the  llama.cpp service
harbor up llamacpp

# Tail service logs
harbor logs llamacpp

# Open llamacpp Web UI
harbor open llamacpp
  • Harbor will automatically allocate GPU resources to the container if available, see Capabilities Detection.
  • llamacpp will be connected to aider, anythingllm, boost, chatui, cmdh, opint, optillm, plandex, traefik, webui services when they are running together.

Models

You can find GGUF models to run on Huggingface here. After you find a model you want to run, grab the URL from the browser address bar and pass it to the harbor config

# Quick lookup for the models
harbof hf find gguf

# 1. With llama.cpp own cache:
#
# - Set the model to run, will be downloaded when llamacpp starts
#   Accepts a full URL to the GGUF file (from Browser address bar)
harbor llamacpp model https://huggingface.co/user/repo/file.gguf
# TIP: You can monitor the download progress with a one-liner below
# Replace "<file>" with the unique portion from the "file.gguf" URL above
du -h $(harbor find .gguf | grep <file>)


# 2. Shared HuggingFace Hub cache, single file:
#
# - Locate the GGUF to download, for example:
#   https://huggingface.co/bartowski/Meta-Llama-3.1-70B-Instruct-GGUF/blob/main/Meta-Llama-3.1-70B-Instruct-Q4_K_M.gguf
# - Download a single file: <user/repo> <file.gguf>
harbor hf download bartowski/Meta-Llama-3.1-70B-Instruct-GGUF Meta-Llama-3.1-70B-Instruct-Q4_K_M.gguf
# - Locate the file in the cache
harbor find Meta-Llama-3.1-70B-Instruct-Q4_K_M.gguf
# - Set the GGUF to llama.cpp
#   "/app/models/hub" is where the HuggingFace cache is mounted in the container
harbor llamacpp gguf /app/models/hub/models--bartowski--Meta-Llama-3.1-70B-Instruct-GGUF/snapshots/83fb6e83d0a8aada42d499259bc929d922e9a558/Meta-Llama-3.1-70B-Instruct-Q4_K_M.gguf


# 3. Shared HuggingFace Hub cache, whole repo:
#
# - Locate and download the repo in its entirety
harbor hf download av-codes/Trinity-2-Codestral-22B-Q4_K_M-GGUF
# - Find the files from the repo
harbor find Trinity-2-Codestral-22B-Q4_K_M-GGUF
# - Set the GGUF to llama.cpp
#   "/app/models/hub" is where the HuggingFace cache is mounted in the container
harbor llamacpp gguf /app/models/hub/models--av-codes--Trinity-2-Codestral-22B-Q4_K_M-GGUF/snapshots/c0a1f7283809423d193025e92eec6f287425ed59/trinity-2-codestral-22b-q4_k_m.gguf

Note

Please, note that this procedure doesn't download the model. If model is not found in the cache, it will be downloaded on the next start of llamacpp service.

Downloaded models are stored in the global llama.cpp cache on your local machine (same as native version uses). The server can only run one model at a time and must be restarted to switch models.

Configuration

You can provide additional arguments to the llama.cpp CLI via the LLAMACPP_EXTRA_ARGS. It can be set either with Harbor CLI or in the .env file.

# See llama.cpp server args
harbor run llamacpp --server --help

# Set the extra arguments
harbor llamacpp args '--max-tokens 1024 -ngl 100'

# Edit the .env file
HARBOR_LLAMACPP_EXTRA_ARGS="--max-tokens 1024 -ngl 100"

You can add llamacpp to default services in Harbor:

# Add llamacpp to the default services
# Will always start when running `harbor up`
harbor defaults add llamacpp

# Remove llamacpp from the default services
harbor defaults rm llamacpp

Following options are available via harbor config:

# Location of the llama.cpp own cache, either global
# or relative to $(harbor home)
LLAMACPP_CACHE                 ~/.cache/llama.cpp

# The port on the host machine where the llama.cpp service
# will be available
LLAMACPP_HOST_PORT             33831

Environment Variables

Follow Harbor's environment variables guide to set arbitrary variables for llamacpp service.

llama.cpp CLIs and scripts

llama.cpp comes with a lot of helper tools/CLIs, which all can be accessed via the harbor exec llamacpp command (once the service is running).

# Show the list of available llama.cpp CLIs
harbor exec llamacpp ls

# See the help for one of the CLIs
harbor exec llamacpp ./scripts/llama-bench --help

Clone this wiki locally

Morty Proxy This is a proxified and sanitized view of the page, visit original site.