Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Commit 876f39d

Browse filesBrowse files
authored
Merge pull request abetlen#564 from Isydmr/main
Docker improvements
2 parents 4cf0461 + a5bc57e commit 876f39d
Copy full SHA for 876f39d

File tree

Expand file treeCollapse file tree

1 file changed

+25
-27
lines changed
Filter options
Expand file treeCollapse file tree

1 file changed

+25
-27
lines changed

‎docker/README.md

Copy file name to clipboard
+25-27Lines changed: 25 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -1,46 +1,55 @@
1-
# Install Docker Server
2-
3-
**Note #1:** This was tested with Docker running on Linux. If you can get it working on Windows or MacOS, please update this `README.md` with a PR!
1+
### Install Docker Server
2+
> [!IMPORTANT]
3+
> This was tested with Docker running on Linux. <br>If you can get it working on Windows or MacOS, please update this `README.md` with a PR!<br>
44
55
[Install Docker Engine](https://docs.docker.com/engine/install)
66

7-
**Note #2:** NVidia GPU CuBLAS support requires a NVidia GPU with sufficient VRAM (approximately as much as the size in the table below) and Docker NVidia support (see [container-toolkit/install-guide](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html))
87

9-
# Simple Dockerfiles for building the llama-cpp-python server with external model bin files
10-
## openblas_simple - a simple Dockerfile for non-GPU OpenBLAS, where the model is located outside the Docker image
8+
## Simple Dockerfiles for building the llama-cpp-python server with external model bin files
9+
### openblas_simple
10+
A simple Dockerfile for non-GPU OpenBLAS, where the model is located outside the Docker image:
1111
```
1212
cd ./openblas_simple
1313
docker build -t openblas_simple .
14-
docker run -e USE_MLOCK=0 -e MODEL=/var/model/<model-path> -v <model-root-path>:/var/model -t openblas_simple
14+
docker run --cap-add SYS_RESOURCE -e USE_MLOCK=0 -e MODEL=/var/model/<model-path> -v <model-root-path>:/var/model -t openblas_simple
1515
```
1616
where `<model-root-path>/<model-path>` is the full path to the model file on the Docker host system.
1717

18-
## cuda_simple - a simple Dockerfile for CUDA accelerated CuBLAS, where the model is located outside the Docker image
18+
### cuda_simple
19+
> [!WARNING]
20+
> Nvidia GPU CuBLAS support requires an Nvidia GPU with sufficient VRAM (approximately as much as the size in the table below) and Docker Nvidia support (see [container-toolkit/install-guide](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html)) <br>
21+
22+
A simple Dockerfile for CUDA-accelerated CuBLAS, where the model is located outside the Docker image:
23+
1924
```
2025
cd ./cuda_simple
2126
docker build -t cuda_simple .
22-
docker run -e USE_MLOCK=0 -e MODEL=/var/model/<model-path> -v <model-root-path>:/var/model -t cuda_simple
27+
docker run --gpus=all --cap-add SYS_RESOURCE -e USE_MLOCK=0 -e MODEL=/var/model/<model-path> -v <model-root-path>:/var/model -t cuda_simple
2328
```
2429
where `<model-root-path>/<model-path>` is the full path to the model file on the Docker host system.
2530

26-
# "Open-Llama-in-a-box"
27-
## Download an Apache V2.0 licensed 3B paramter Open Llama model and install into a Docker image that runs an OpenBLAS-enabled llama-cpp-python server
31+
--------------------------------------------------------------------------
32+
33+
### "Open-Llama-in-a-box"
34+
Download an Apache V2.0 licensed 3B params Open LLaMA model and install into a Docker image that runs an OpenBLAS-enabled llama-cpp-python server:
2835
```
2936
$ cd ./open_llama
3037
./build.sh
3138
./start.sh
3239
```
3340

34-
# Manually choose your own Llama model from Hugging Face
41+
### Manually choose your own Llama model from Hugging Face
3542
`python3 ./hug_model.py -a TheBloke -t llama`
3643
You should now have a model in the current directory and `model.bin` symlinked to it for the subsequent Docker build and copy step. e.g.
3744
```
3845
docker $ ls -lh *.bin
3946
-rw-rw-r-- 1 user user 4.8G May 23 18:30 <downloaded-model-file>q5_1.bin
4047
lrwxrwxrwx 1 user user 24 May 23 18:30 model.bin -> <downloaded-model-file>q5_1.bin
4148
```
42-
**Note #1:** Make sure you have enough disk space to download the model. As the model is then copied into the image you will need at least
43-
**TWICE** as much disk space as the size of the model:
49+
50+
> [!NOTE]
51+
> Make sure you have enough disk space to download the model. As the model is then copied into the image you will need at least
52+
**TWICE** as much disk space as the size of the model:<br>
4453
4554
| Model | Quantized size |
4655
|------:|----------------:|
@@ -50,17 +59,6 @@ lrwxrwxrwx 1 user user 24 May 23 18:30 model.bin -> <downloaded-model-file>q5_
5059
| 33B | 25 GB |
5160
| 65B | 50 GB |
5261

53-
**Note #2:** If you want to pass or tune additional parameters, customise `./start_server.sh` before running `docker build ...`
54-
55-
## Use OpenBLAS
56-
Use if you don't have a NVidia GPU. Defaults to `python:3-slim-bullseye` Docker base image and OpenBLAS:
57-
### Build:
58-
`docker build -t openblas .`
59-
### Run:
60-
`docker run --cap-add SYS_RESOURCE -t openblas`
6162

62-
## Use CuBLAS
63-
### Build:
64-
`docker build --build-arg IMAGE=nvidia/cuda:12.1.1-devel-ubuntu22.04 -t cublas .`
65-
### Run:
66-
`docker run --cap-add SYS_RESOURCE -t cublas`
63+
> [!NOTE]
64+
> If you want to pass or tune additional parameters, customise `./start_server.sh` before running `docker build ...`

0 commit comments

Comments
0 (0)
Morty Proxy This is a proxified and sanitized view of the page, visit original site.