Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Commit 1783320

Browse filesBrowse files
authored
Merge branch 'abetlen:main' into main
2 parents 4cf0861 + 2a0844b commit 1783320
Copy full SHA for 1783320

File tree

Expand file treeCollapse file tree

9 files changed

+63
-45
lines changed
Filter options
Expand file treeCollapse file tree

9 files changed

+63
-45
lines changed

‎CHANGELOG.md

Copy file name to clipboardExpand all lines: CHANGELOG.md
+7Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,13 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
77

88
## [Unreleased]
99

10+
## [0.1.78]
11+
12+
### Added
13+
14+
- Grammar based sampling via LlamaGrammar which can be passed to completions
15+
- Make n_gpu_layers == -1 offload all layers
16+
1017
## [0.1.77]
1118

1219
- (llama.cpp) Update llama.cpp add support for LLaMa 2 70B

‎README.md

Copy file name to clipboardExpand all lines: README.md
+1-1Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -201,7 +201,7 @@ This package is under active development and I welcome any contributions.
201201
To get started, clone the repository and install the package in development mode:
202202

203203
```bash
204-
git clone --recurse-submodules git@github.com:abetlen/llama-cpp-python.git
204+
git clone --recurse-submodules https://github.com/abetlen/llama-cpp-python.git
205205
cd llama-cpp-python
206206

207207
# Install with pip

‎docker/README.md

Copy file name to clipboard
+25-27Lines changed: 25 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -1,46 +1,55 @@
1-
# Install Docker Server
2-
3-
**Note #1:** This was tested with Docker running on Linux. If you can get it working on Windows or MacOS, please update this `README.md` with a PR!
1+
### Install Docker Server
2+
> [!IMPORTANT]
3+
> This was tested with Docker running on Linux. <br>If you can get it working on Windows or MacOS, please update this `README.md` with a PR!<br>
44
55
[Install Docker Engine](https://docs.docker.com/engine/install)
66

7-
**Note #2:** NVidia GPU CuBLAS support requires a NVidia GPU with sufficient VRAM (approximately as much as the size in the table below) and Docker NVidia support (see [container-toolkit/install-guide](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html))
87

9-
# Simple Dockerfiles for building the llama-cpp-python server with external model bin files
10-
## openblas_simple - a simple Dockerfile for non-GPU OpenBLAS, where the model is located outside the Docker image
8+
## Simple Dockerfiles for building the llama-cpp-python server with external model bin files
9+
### openblas_simple
10+
A simple Dockerfile for non-GPU OpenBLAS, where the model is located outside the Docker image:
1111
```
1212
cd ./openblas_simple
1313
docker build -t openblas_simple .
14-
docker run -e USE_MLOCK=0 -e MODEL=/var/model/<model-path> -v <model-root-path>:/var/model -t openblas_simple
14+
docker run --cap-add SYS_RESOURCE -e USE_MLOCK=0 -e MODEL=/var/model/<model-path> -v <model-root-path>:/var/model -t openblas_simple
1515
```
1616
where `<model-root-path>/<model-path>` is the full path to the model file on the Docker host system.
1717

18-
## cuda_simple - a simple Dockerfile for CUDA accelerated CuBLAS, where the model is located outside the Docker image
18+
### cuda_simple
19+
> [!WARNING]
20+
> Nvidia GPU CuBLAS support requires an Nvidia GPU with sufficient VRAM (approximately as much as the size in the table below) and Docker Nvidia support (see [container-toolkit/install-guide](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html)) <br>
21+
22+
A simple Dockerfile for CUDA-accelerated CuBLAS, where the model is located outside the Docker image:
23+
1924
```
2025
cd ./cuda_simple
2126
docker build -t cuda_simple .
22-
docker run -e USE_MLOCK=0 -e MODEL=/var/model/<model-path> -v <model-root-path>:/var/model -t cuda_simple
27+
docker run --gpus=all --cap-add SYS_RESOURCE -e USE_MLOCK=0 -e MODEL=/var/model/<model-path> -v <model-root-path>:/var/model -t cuda_simple
2328
```
2429
where `<model-root-path>/<model-path>` is the full path to the model file on the Docker host system.
2530

26-
# "Open-Llama-in-a-box"
27-
## Download an Apache V2.0 licensed 3B paramter Open Llama model and install into a Docker image that runs an OpenBLAS-enabled llama-cpp-python server
31+
--------------------------------------------------------------------------
32+
33+
### "Open-Llama-in-a-box"
34+
Download an Apache V2.0 licensed 3B params Open LLaMA model and install into a Docker image that runs an OpenBLAS-enabled llama-cpp-python server:
2835
```
2936
$ cd ./open_llama
3037
./build.sh
3138
./start.sh
3239
```
3340

34-
# Manually choose your own Llama model from Hugging Face
41+
### Manually choose your own Llama model from Hugging Face
3542
`python3 ./hug_model.py -a TheBloke -t llama`
3643
You should now have a model in the current directory and `model.bin` symlinked to it for the subsequent Docker build and copy step. e.g.
3744
```
3845
docker $ ls -lh *.bin
3946
-rw-rw-r-- 1 user user 4.8G May 23 18:30 <downloaded-model-file>q5_1.bin
4047
lrwxrwxrwx 1 user user 24 May 23 18:30 model.bin -> <downloaded-model-file>q5_1.bin
4148
```
42-
**Note #1:** Make sure you have enough disk space to download the model. As the model is then copied into the image you will need at least
43-
**TWICE** as much disk space as the size of the model:
49+
50+
> [!NOTE]
51+
> Make sure you have enough disk space to download the model. As the model is then copied into the image you will need at least
52+
**TWICE** as much disk space as the size of the model:<br>
4453
4554
| Model | Quantized size |
4655
|------:|----------------:|
@@ -50,17 +59,6 @@ lrwxrwxrwx 1 user user 24 May 23 18:30 model.bin -> <downloaded-model-file>q5_
5059
| 33B | 25 GB |
5160
| 65B | 50 GB |
5261

53-
**Note #2:** If you want to pass or tune additional parameters, customise `./start_server.sh` before running `docker build ...`
54-
55-
## Use OpenBLAS
56-
Use if you don't have a NVidia GPU. Defaults to `python:3-slim-bullseye` Docker base image and OpenBLAS:
57-
### Build:
58-
`docker build -t openblas .`
59-
### Run:
60-
`docker run --cap-add SYS_RESOURCE -t openblas`
6162

62-
## Use CuBLAS
63-
### Build:
64-
`docker build --build-arg IMAGE=nvidia/cuda:12.1.1-devel-ubuntu22.04 -t cublas .`
65-
### Run:
66-
`docker run --cap-add SYS_RESOURCE -t cublas`
63+
> [!NOTE]
64+
> If you want to pass or tune additional parameters, customise `./start_server.sh` before running `docker build ...`

‎docker/cuda_simple/Dockerfile

Copy file name to clipboardExpand all lines: docker/cuda_simple/Dockerfile
+14-3Lines changed: 14 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -4,13 +4,24 @@ FROM nvidia/cuda:${CUDA_IMAGE}
44
# We need to set the host to 0.0.0.0 to allow outside access
55
ENV HOST 0.0.0.0
66

7+
RUN apt-get update && apt-get upgrade -y \
8+
&& apt-get install -y git build-essential \
9+
python3 python3-pip gcc wget \
10+
ocl-icd-opencl-dev opencl-headers clinfo \
11+
libclblast-dev libopenblas-dev \
12+
&& mkdir -p /etc/OpenCL/vendors && echo "libnvidia-opencl.so.1" > /etc/OpenCL/vendors/nvidia.icd
13+
714
COPY . .
815

9-
# Install the package
10-
RUN apt update && apt install -y python3 python3-pip
16+
# setting build related env vars
17+
ENV CUDA_DOCKER_ARCH=all
18+
ENV LLAMA_CUBLAS=1
19+
20+
# Install depencencies
1121
RUN python3 -m pip install --upgrade pip pytest cmake scikit-build setuptools fastapi uvicorn sse-starlette pydantic-settings
1222

13-
RUN LLAMA_CUBLAS=1 pip install llama-cpp-python
23+
# Install llama-cpp-python (build with cuda)
24+
RUN CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python
1425

1526
# Run the server
1627
CMD python3 -m llama_cpp.server

‎llama_cpp/llama.py

Copy file name to clipboardExpand all lines: llama_cpp/llama.py
-1Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,4 @@
11
import os
2-
from pathlib import Path
32
import sys
43
import uuid
54
import time

‎llama_cpp/llama_grammar.py

Copy file name to clipboardExpand all lines: llama_cpp/llama_grammar.py
+13-10Lines changed: 13 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1031,10 +1031,10 @@ def print_grammar_char(file: TextIO, c: int) -> None:
10311031
# }
10321032
def is_char_element(elem: LlamaGrammarElement) -> bool:
10331033
return elem.type in (
1034-
llama_gretype.LLAMA_GRETYPE_CHAR.value,
1035-
llama_gretype.LLAMA_GRETYPE_CHAR_NOT.value,
1036-
llama_gretype.LLAMA_GRETYPE_CHAR_ALT.value,
1037-
llama_gretype.LLAMA_GRETYPE_CHAR_RNG_UPPER.value,
1034+
llama_gretype.LLAMA_GRETYPE_CHAR,
1035+
llama_gretype.LLAMA_GRETYPE_CHAR_NOT,
1036+
llama_gretype.LLAMA_GRETYPE_CHAR_ALT,
1037+
llama_gretype.LLAMA_GRETYPE_CHAR_RNG_UPPER,
10381038
)
10391039

10401040

@@ -1054,9 +1054,10 @@ def print_rule(
10541054
# "malformed rule, does not end with LLAMA_GRETYPE_END: " + std::to_string(rule_id));
10551055
# }
10561056
# fprintf(file, "%s ::= ", symbol_id_names.at(rule_id).c_str());
1057-
if rule.empty() or rule.back().type != llama_gretype.LLAMA_GRETYPE_END.value:
1057+
if rule.empty() or rule.back().type != llama_gretype.LLAMA_GRETYPE_END:
10581058
raise RuntimeError(
1059-
"malformed rule, does not end with LLAMA_GRETYPE_END: " + str(rule_id)
1059+
"malformed rule, does not end with LLAMA_GRETYPE_END: "
1060+
+ str(rule_id)
10601061
)
10611062
print(f"{symbol_id_names.at(rule_id)} ::=", file=file, end=" ")
10621063
# for (size_t i = 0, end = rule.size() - 1; i < end; i++) {
@@ -1100,8 +1101,10 @@ def print_rule(
11001101
# }
11011102
for i, elem in enumerate(rule[:-1]):
11021103
case = elem.type # type: llama_gretype
1103-
if case is llama_gretype.LLAMA_GRETYPE_END.value:
1104-
raise RuntimeError("unexpected end of rule: " + str(rule_id) + "," + str(i))
1104+
if case is llama_gretype.LLAMA_GRETYPE_END:
1105+
raise RuntimeError(
1106+
"unexpected end of rule: " + str(rule_id) + "," + str(i)
1107+
)
11051108
elif case is llama_gretype.LLAMA_GRETYPE_ALT:
11061109
print("| ", file=file, end="")
11071110
elif case is llama_gretype.LLAMA_GRETYPE_RULE_REF:
@@ -1140,8 +1143,8 @@ def print_rule(
11401143
# fprintf(file, "] ");
11411144
if is_char_element(elem):
11421145
if rule[i + 1].type in (
1143-
llama_gretype.LLAMA_GRETYPE_CHAR_ALT.value,
1144-
llama_gretype.LLAMA_GRETYPE_CHAR_RNG_UPPER.value,
1146+
llama_gretype.LLAMA_GRETYPE_CHAR_ALT,
1147+
llama_gretype.LLAMA_GRETYPE_CHAR_RNG_UPPER,
11451148
):
11461149
pass
11471150
else:

‎pyproject.toml

Copy file name to clipboardExpand all lines: pyproject.toml
+1-1Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
[tool.poetry]
22
name = "llama_cpp_python"
3-
version = "0.1.77"
3+
version = "0.1.78"
44
description = "Python bindings for the llama.cpp library"
55
authors = ["Andrei Betlen <abetlen@gmail.com>"]
66
license = "MIT"

‎setup.py

Copy file name to clipboardExpand all lines: setup.py
+1-1Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@
1010
description="A Python wrapper for llama.cpp",
1111
long_description=long_description,
1212
long_description_content_type="text/markdown",
13-
version="0.1.77",
13+
version="0.1.78",
1414
author="Andrei Betlen",
1515
author_email="abetlen@gmail.com",
1616
license="MIT",

‎vendor/llama.cpp

Copy file name to clipboard

0 commit comments

Comments
0 (0)
Morty Proxy This is a proxified and sanitized view of the page, visit original site.