Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Commit 3977eea

Browse filesBrowse files
authored
Merge pull request abetlen#310 from gjmulder/auto-docker
Auto docker v2 - dockerised Open Llama 3B image w/OpenBLAS enabled server
2 parents 71f4582 + 30d32e9 commit 3977eea
Copy full SHA for 3977eea

File tree

Expand file treeCollapse file tree

9 files changed

+128
-40
lines changed
Filter options
Expand file treeCollapse file tree

9 files changed

+128
-40
lines changed

‎.gitignore

Copy file name to clipboardExpand all lines: .gitignore
+3Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -164,3 +164,6 @@ cython_debug/
164164
# and can be added to the global gitignore or merged into this file. For a more nuclear
165165
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
166166
.idea/
167+
168+
# downloaded model .bin files
169+
docker/open_llama/*.bin

‎docker/README.md

Copy file name to clipboard
+45-25Lines changed: 45 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -1,46 +1,66 @@
1-
# Dockerfiles for building the llama-cpp-python server
2-
- `Dockerfile.openblas_simple` - a simple Dockerfile for non-GPU OpenBLAS
3-
- `Dockerfile.cuda_simple` - a simple Dockerfile for CUDA accelerated CuBLAS
4-
- `hug_model.py` - a Python utility for interactively choosing and downloading the latest `5_1` quantized models from [huggingface.co/TheBloke]( https://huggingface.co/TheBloke)
5-
- `Dockerfile` - a single OpenBLAS and CuBLAS combined Dockerfile that automatically installs a previously downloaded model `model.bin`
6-
7-
# Get model from Hugging Face
8-
`python3 ./hug_model.py`
1+
# Install Docker Server
2+
3+
**Note #1:** This was tested with Docker running on Linux. If you can get it working on Windows or MacOS, please update this `README.md` with a PR!
4+
5+
[Install Docker Engine](https://docs.docker.com/engine/install)
6+
7+
**Note #2:** NVidia GPU CuBLAS support requires a NVidia GPU with sufficient VRAM (approximately as much as the size in the table below) and Docker NVidia support (see [container-toolkit/install-guide](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html))
98

9+
# Simple Dockerfiles for building the llama-cpp-python server with external model bin files
10+
## openblas_simple - a simple Dockerfile for non-GPU OpenBLAS, where the model is located outside the Docker image
11+
```
12+
cd ./openblas_simple
13+
docker build -t openblas_simple .
14+
docker run -e USE_MLOCK=0 -e MODEL=/var/model/<model-path> -v <model-root-path>:/var/model -t openblas_simple
15+
```
16+
where `<model-root-path>/<model-path>` is the full path to the model file on the Docker host system.
17+
18+
## cuda_simple - a simple Dockerfile for CUDA accelerated CuBLAS, where the model is located outside the Docker image
19+
```
20+
cd ./cuda_simple
21+
docker build -t cuda_simple .
22+
docker run -e USE_MLOCK=0 -e MODEL=/var/model/<model-path> -v <model-root-path>:/var/model -t cuda_simple
23+
```
24+
where `<model-root-path>/<model-path>` is the full path to the model file on the Docker host system.
25+
26+
# "Open-Llama-in-a-box"
27+
## Download an Apache V2.0 licensed 3B paramter Open Llama model and install into a Docker image that runs an OpenBLAS-enabled llama-cpp-python server
28+
```
29+
$ cd ./open_llama
30+
./build.sh
31+
./start.sh
32+
```
33+
34+
# Manually choose your own Llama model from Hugging Face
35+
`python3 ./hug_model.py -a TheBloke -t llama`
1036
You should now have a model in the current directory and `model.bin` symlinked to it for the subsequent Docker build and copy step. e.g.
1137
```
1238
docker $ ls -lh *.bin
13-
-rw-rw-r-- 1 user user 4.8G May 23 18:30 <downloaded-model-file>.q5_1.bin
14-
lrwxrwxrwx 1 user user 24 May 23 18:30 model.bin -> <downloaded-model-file>.q5_1.bin
39+
-rw-rw-r-- 1 user user 4.8G May 23 18:30 <downloaded-model-file>q5_1.bin
40+
lrwxrwxrwx 1 user user 24 May 23 18:30 model.bin -> <downloaded-model-file>q5_1.bin
1541
```
1642
**Note #1:** Make sure you have enough disk space to download the model. As the model is then copied into the image you will need at least
1743
**TWICE** as much disk space as the size of the model:
1844

1945
| Model | Quantized size |
2046
|------:|----------------:|
47+
| 3B | 3 GB |
2148
| 7B | 5 GB |
2249
| 13B | 10 GB |
23-
| 30B | 25 GB |
50+
| 33B | 25 GB |
2451
| 65B | 50 GB |
2552

2653
**Note #2:** If you want to pass or tune additional parameters, customise `./start_server.sh` before running `docker build ...`
2754

28-
# Install Docker Server
29-
30-
**Note #3:** This was tested with Docker running on Linux. If you can get it working on Windows or MacOS, please update this `README.md` with a PR!
31-
32-
[Install Docker Engine](https://docs.docker.com/engine/install)
33-
34-
# Use OpenBLAS
55+
## Use OpenBLAS
3556
Use if you don't have a NVidia GPU. Defaults to `python:3-slim-bullseye` Docker base image and OpenBLAS:
36-
## Build:
37-
`docker build --build-arg -t openblas .`
38-
## Run:
57+
### Build:
58+
`docker build -t openblas .`
59+
### Run:
3960
`docker run --cap-add SYS_RESOURCE -t openblas`
4061

41-
# Use CuBLAS
42-
Requires a NVidia GPU with sufficient VRAM (approximately as much as the size above) and Docker NVidia support (see [container-toolkit/install-guide](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html))
43-
## Build:
62+
## Use CuBLAS
63+
### Build:
4464
`docker build --build-arg IMAGE=nvidia/cuda:12.1.1-devel-ubuntu22.04 -t cublas .`
45-
## Run:
65+
### Run:
4666
`docker run --cap-add SYS_RESOURCE -t cublas`

‎docker/Dockerfile.cuda_simple renamed to ‎docker/cuda_simple/Dockerfile

Copy file name to clipboardExpand all lines: docker/cuda_simple/Dockerfile
+2-2Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
ARG CUDA_IMAGE="12.1.1-devel-ubuntu22.04"
2-
FROM ${CUDA_IMAGE}
2+
FROM nvidia/cuda:${CUDA_IMAGE}
33

44
# We need to set the host to 0.0.0.0 to allow outside access
55
ENV HOST 0.0.0.0
@@ -10,7 +10,7 @@ COPY . .
1010
RUN apt update && apt install -y python3 python3-pip
1111
RUN python3 -m pip install --upgrade pip pytest cmake scikit-build setuptools fastapi uvicorn sse-starlette
1212

13-
RUN LLAMA_CUBLAS=1 python3 setup.py develop
13+
RUN LLAMA_CUBLAS=1 pip install llama-cpp-python
1414

1515
# Run the server
1616
CMD python3 -m llama_cpp.server
File renamed without changes.

‎docker/open_llama/build.sh

Copy file name to clipboard
+14Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
#!/bin/sh
2+
3+
MODEL="open_llama_3b"
4+
# Get open_llama_3b_ggml q5_1 quantization
5+
python3 ./hug_model.py -a SlyEcho -s ${MODEL} -f "q5_1"
6+
ls -lh *.bin
7+
8+
# Build the default OpenBLAS image
9+
docker build -t $MODEL .
10+
docker images | egrep "^(REPOSITORY|$MODEL)"
11+
12+
echo
13+
echo "To start the docker container run:"
14+
echo "docker run -t -p 8000:8000 $MODEL"

‎docker/hug_model.py renamed to ‎docker/open_llama/hug_model.py

Copy file name to clipboardExpand all lines: docker/open_llama/hug_model.py
+34-11Lines changed: 34 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@
22
import json
33
import os
44
import struct
5+
import argparse
56

67
def make_request(url, params=None):
78
print(f"Making request to {url}...")
@@ -69,21 +70,30 @@ def get_user_choice(model_list):
6970

7071
return None
7172

72-
import argparse
73-
7473
def main():
7574
# Create an argument parser
76-
parser = argparse.ArgumentParser(description='Process the model version.')
75+
parser = argparse.ArgumentParser(description='Process some parameters.')
76+
77+
# Arguments
7778
parser.add_argument('-v', '--version', type=int, default=0x0003,
78-
help='an integer for the version to be used')
79+
help='hexadecimal version number of ggml file')
80+
parser.add_argument('-a', '--author', type=str, default='TheBloke',
81+
help='HuggingFace author filter')
82+
parser.add_argument('-t', '--tag', type=str, default='llama',
83+
help='HuggingFace tag filter')
84+
parser.add_argument('-s', '--search', type=str, default='',
85+
help='HuggingFace search filter')
86+
parser.add_argument('-f', '--filename', type=str, default='q5_1',
87+
help='HuggingFace model repository filename substring match')
7988

8089
# Parse the arguments
8190
args = parser.parse_args()
8291

8392
# Define the parameters
8493
params = {
85-
"author": "TheBloke", # Filter by author
86-
"tags": "llama"
94+
"author": args.author,
95+
"tags": args.tag,
96+
"search": args.search
8797
}
8898

8999
models = make_request('https://huggingface.co/api/models', params=params)
@@ -100,17 +110,30 @@ def main():
100110

101111
for sibling in model_info.get('siblings', []):
102112
rfilename = sibling.get('rfilename')
103-
if rfilename and 'q5_1' in rfilename:
113+
if rfilename and args.filename in rfilename:
104114
model_list.append((model_id, rfilename))
105115

106-
model_choice = get_user_choice(model_list)
116+
# Choose the model
117+
model_list.sort(key=lambda x: x[0])
118+
if len(model_list) == 0:
119+
print("No models found")
120+
exit(1)
121+
elif len(model_list) == 1:
122+
model_choice = model_list[0]
123+
else:
124+
model_choice = get_user_choice(model_list)
125+
107126
if model_choice is not None:
108127
model_id, rfilename = model_choice
109128
url = f"https://huggingface.co/{model_id}/resolve/main/{rfilename}"
110-
download_file(url, rfilename)
111-
_, version = check_magic_and_version(rfilename)
129+
dest = f"{model_id.replace('/', '_')}_{rfilename}"
130+
download_file(url, dest)
131+
_, version = check_magic_and_version(dest)
112132
if version != args.version:
113-
print(f"Warning: Expected version {args.version}, but found different version in the file.")
133+
print(f"Warning: Expected version {args.version}, but found different version in the file.")
134+
else:
135+
print("Error - model choice was None")
136+
exit(2)
114137

115138
if __name__ == '__main__':
116139
main()

‎docker/open_llama/start.sh

Copy file name to clipboard
+28Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
#!/bin/sh
2+
3+
MODEL="open_llama_3b"
4+
5+
# Start Docker container
6+
docker run --cap-add SYS_RESOURCE -p 8000:8000 -t $MODEL &
7+
sleep 10
8+
echo
9+
docker ps | egrep "(^CONTAINER|$MODEL)"
10+
11+
# Test the model works
12+
echo
13+
curl -X 'POST' 'http://localhost:8000/v1/completions' -H 'accept: application/json' -H 'Content-Type: application/json' -d '{
14+
"prompt": "\n\n### Instructions:\nWhat is the capital of France?\n\n### Response:\n",
15+
"stop": [
16+
"\n",
17+
"###"
18+
]
19+
}' | grep Paris
20+
if [ $? -eq 0 ]
21+
then
22+
echo
23+
echo "$MODEL is working!!"
24+
else
25+
echo
26+
echo "ERROR: $MODEL not replying."
27+
exit 1
28+
fi

‎docker/start_server.sh renamed to ‎docker/open_llama/start_server.sh

Copy file name to clipboardExpand all lines: docker/open_llama/start_server.sh
+1-1Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
#!/bin/sh
22

3-
# For mmap support
3+
# For mlock support
44
ulimit -l unlimited
55

66
if [ "$IMAGE" = "python:3-slim-bullseye" ]; then

‎docker/Dockerfile.openblas_simple renamed to ‎docker/openblas_simple/Dockerfile

Copy file name to clipboardExpand all lines: docker/openblas_simple/Dockerfile
+1-1Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ COPY . .
99
RUN apt update && apt install -y libopenblas-dev ninja-build build-essential
1010
RUN python -m pip install --upgrade pip pytest cmake scikit-build setuptools fastapi uvicorn sse-starlette
1111

12-
RUN LLAMA_OPENBLAS=1 python3 setup.py develop
12+
RUN LLAMA_OPENBLAS=1 pip install llama_cpp_python --verbose
1313

1414
# Run the server
1515
CMD python3 -m llama_cpp.server

0 commit comments

Comments
0 (0)
Morty Proxy This is a proxified and sanitized view of the page, visit original site.