Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Commit 32efed7

Browse filesBrowse files
committed
docs: Update README
1 parent d80c5cf commit 32efed7
Copy full SHA for 32efed7

File tree

Expand file treeCollapse file tree

1 file changed

+91
-33
lines changed
Filter options
Expand file treeCollapse file tree

1 file changed

+91
-33
lines changed

‎README.md

Copy file name to clipboardExpand all lines: README.md
+91-33Lines changed: 91 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -25,105 +25,162 @@ Documentation is available at [https://llama-cpp-python.readthedocs.io/en/latest
2525

2626
## Installation
2727

28-
`llama-cpp-python` can be installed directly from PyPI as a source distribution by running:
28+
Requirements:
29+
30+
- Python 3.8+
31+
- C compiler
32+
- Linux: gcc or clang
33+
- Windows: Visual Studio or MinGW
34+
- MacOS: Xcode
35+
36+
To install the package, run:
2937

3038
```bash
3139
pip install llama-cpp-python
3240
```
3341

34-
This will build `llama.cpp` from source using cmake and your system's c compiler (required) and install the library alongside this python package.
42+
This will also build `llama.cpp` from source and install it alongside this python package.
3543

36-
If you run into issues during installation add the `--verbose` flag to the `pip install` command to see the full cmake build log.
44+
If this fails, add `--verbose` to the `pip install` see the full cmake build log.
3745

38-
### Installation with Specific Hardware Acceleration (BLAS, CUDA, Metal, etc)
46+
### Installation Configuration
3947

40-
The default pip install behaviour is to build `llama.cpp` for CPU only on Linux and Windows and use Metal on MacOS.
48+
`llama.cpp` supports a number of hardware acceleration backends to speed up inference as well as backend specific options. See the [llama.cpp README](https://github.com/ggerganov/llama.cpp#build) for a full list.
4149

42-
`llama.cpp` supports a number of hardware acceleration backends depending including OpenBLAS, cuBLAS, CLBlast, HIPBLAS, and Metal.
43-
See the [llama.cpp README](https://github.com/ggerganov/llama.cpp#build) for a full list of supported backends.
50+
All `llama.cpp` cmake build options can be set via the `CMAKE_ARGS` environment variable or via the `--config-settings / -C` cli flag during installation.
4451

45-
All of these backends are supported by `llama-cpp-python` and can be enabled by setting the `CMAKE_ARGS` environment variable before installing.
46-
47-
On Linux and Mac you set the `CMAKE_ARGS` like this:
52+
<details>
53+
<summary>Environment Variables</summary>
4854

4955
```bash
50-
CMAKE_ARGS="-DLLAMA_BLAS=ON -DLLAMA_BLAS_VENDOR=OpenBLAS" pip install llama-cpp-python
56+
# Linux and Mac
57+
CMAKE_ARGS="-DLLAMA_BLAS=ON -DLLAMA_BLAS_VENDOR=OpenBLAS" \
58+
pip install llama-cpp-python
5159
```
5260

53-
On Windows you can set the `CMAKE_ARGS` like this:
54-
55-
```ps
61+
```powershell
62+
# Windows
5663
$env:CMAKE_ARGS = "-DLLAMA_BLAS=ON -DLLAMA_BLAS_VENDOR=OpenBLAS"
5764
pip install llama-cpp-python
5865
```
66+
</details>
67+
68+
<details>
69+
<summary>CLI / requirements.txt</summary>
70+
71+
They can also be set via `pip install -C / --config-settings` command and saved to a `requirements.txt` file:
72+
73+
```bash
74+
pip install --upgrade pip # ensure pip is up to date
75+
pip install llama-cpp-python \
76+
-C cmake.args="-DLLAMA_BLAS=ON;-DLLAMA_BLAS_VENDOR=OpenBLAS"
77+
```
78+
79+
```txt
80+
# requirements.txt
81+
82+
llama-cpp-python -C cmake.args="-DLLAMA_BLAS=ON;-DLLAMA_BLAS_VENDOR=OpenBLAS"
83+
```
84+
85+
</details>
5986

60-
#### OpenBLAS
6187

62-
To install with OpenBLAS, set the `LLAMA_BLAS and LLAMA_BLAS_VENDOR` environment variables before installing:
88+
### Supported Backends
89+
90+
Below are some common backends, their build commands and any additional environment variables required.
91+
92+
<details>
93+
<summary>OpenBLAS (CPU)</summary>
94+
95+
To install with OpenBLAS, set the `LLAMA_BLAS` and `LLAMA_BLAS_VENDOR` environment variables before installing:
6396

6497
```bash
6598
CMAKE_ARGS="-DLLAMA_BLAS=ON -DLLAMA_BLAS_VENDOR=OpenBLAS" pip install llama-cpp-python
6699
```
100+
</details>
67101

68-
#### cuBLAS
102+
<details>
103+
<summary>cuBLAS (CUDA)</summary>
69104

70105
To install with cuBLAS, set the `LLAMA_CUBLAS=on` environment variable before installing:
71106

72107
```bash
73108
CMAKE_ARGS="-DLLAMA_CUBLAS=on" pip install llama-cpp-python
74109
```
75110

76-
#### Metal
111+
</details>
112+
113+
<details>
114+
<summary>Metal</summary>
77115

78116
To install with Metal (MPS), set the `LLAMA_METAL=on` environment variable before installing:
79117

80118
```bash
81119
CMAKE_ARGS="-DLLAMA_METAL=on" pip install llama-cpp-python
82120
```
83121

84-
#### CLBlast
122+
</details>
123+
<details>
124+
125+
<summary>CLBlast (OpenCL)</summary>
85126

86127
To install with CLBlast, set the `LLAMA_CLBLAST=on` environment variable before installing:
87128

88129
```bash
89130
CMAKE_ARGS="-DLLAMA_CLBLAST=on" pip install llama-cpp-python
90131
```
91132

92-
#### hipBLAS
133+
</details>
134+
135+
<details>
136+
<summary>hipBLAS (ROCm)</summary>
93137

94138
To install with hipBLAS / ROCm support for AMD cards, set the `LLAMA_HIPBLAS=on` environment variable before installing:
95139

96140
```bash
97141
CMAKE_ARGS="-DLLAMA_HIPBLAS=on" pip install llama-cpp-python
98142
```
99143

100-
#### Vulkan
144+
</details>
145+
146+
<details>
147+
<summary>Vulkan</summary>
101148

102149
To install with Vulkan support, set the `LLAMA_VULKAN=on` environment variable before installing:
103150

104151
```bash
105152
CMAKE_ARGS="-DLLAMA_VULKAN=on" pip install llama-cpp-python
106153
```
107154

108-
#### Kompute
155+
</details>
156+
157+
<details>
158+
<summary>Kompute</summary>
109159

110160
To install with Kompute support, set the `LLAMA_KOMPUTE=on` environment variable before installing:
111161

112162
```bash
113163
CMAKE_ARGS="-DLLAMA_KOMPUTE=on" pip install llama-cpp-python
114164
```
165+
</details>
115166

116-
#### SYCL
167+
<details>
168+
<summary>SYCL</summary>
117169

118170
To install with SYCL support, set the `LLAMA_SYCL=on` environment variable before installing:
119171

120172
```bash
121173
source /opt/intel/oneapi/setvars.sh
122174
CMAKE_ARGS="-DLLAMA_SYCL=on -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx" pip install llama-cpp-python
123175
```
176+
</details>
177+
124178

125179
### Windows Notes
126180

181+
<details>
182+
<summary>Error: Can't find 'nmake' or 'CMAKE_C_COMPILER'</summary>
183+
127184
If you run into issues where it complains it can't find `'nmake'` `'?'` or CMAKE_C_COMPILER, you can extract w64devkit as [mentioned in llama.cpp repo](https://github.com/ggerganov/llama.cpp#openblas) and add those manually to CMAKE_ARGS before running `pip` install:
128185

129186
```ps
@@ -132,12 +189,14 @@ $env:CMAKE_ARGS = "-DLLAMA_OPENBLAS=on -DCMAKE_C_COMPILER=C:/w64devkit/bin/gcc.e
132189
```
133190

134191
See the above instructions and set `CMAKE_ARGS` to the BLAS backend you want to use.
192+
</details>
135193

136194
### MacOS Notes
137195

138196
Detailed MacOS Metal GPU install documentation is available at [docs/install/macos.md](https://llama-cpp-python.readthedocs.io/en/latest/install/macos/)
139197

140-
#### M1 Mac Performance Issue
198+
<details>
199+
<summary>M1 Mac Performance Issue</summary>
141200

142201
Note: If you are using Apple Silicon (M1) Mac, make sure you have installed a version of Python that supports arm64 architecture. For example:
143202

@@ -147,24 +206,21 @@ bash Miniforge3-MacOSX-arm64.sh
147206
```
148207

149208
Otherwise, while installing it will build the llama.cpp x86 version which will be 10x slower on Apple Silicon (M1) Mac.
209+
</details>
150210

151-
#### M Series Mac Error: `(mach-o file, but is an incompatible architecture (have 'x86_64', need 'arm64'))`
211+
<details>
212+
<summary>M Series Mac Error: `(mach-o file, but is an incompatible architecture (have 'x86_64', need 'arm64'))`</summary>
152213

153214
Try installing with
154215

155216
```bash
156217
CMAKE_ARGS="-DCMAKE_OSX_ARCHITECTURES=arm64 -DCMAKE_APPLE_SILICON_PROCESSOR=arm64 -DLLAMA_METAL=on" pip install --upgrade --verbose --force-reinstall --no-cache-dir llama-cpp-python
157218
```
219+
</details>
158220

159221
### Upgrading and Reinstalling
160222

161-
To upgrade or rebuild `llama-cpp-python` add the following flags to ensure that the package is rebuilt correctly:
162-
163-
```bash
164-
pip install llama-cpp-python --upgrade --force-reinstall --no-cache-dir
165-
```
166-
167-
This will ensure that all source files are re-built with the most recently set `CMAKE_ARGS` flags.
223+
To upgrade and rebuild `llama-cpp-python` add `--upgrade --force-reinstall --no-cache-dir` flags to the `pip install` command to ensure the package is rebuilt from source.
168224

169225
## High-level API
170226

@@ -218,13 +274,15 @@ You can pull `Llama` models from Hugging Face using the `from_pretrained` method
218274
You'll need to install the `huggingface-hub` package to use this feature (`pip install huggingface-hub`).
219275

220276
```python
221-
llama = Llama.from_pretrained(
277+
llm = Llama.from_pretrained(
222278
repo_id="Qwen/Qwen1.5-0.5B-Chat-GGUF",
223279
filename="*q8_0.gguf",
224280
verbose=False
225281
)
226282
```
227283

284+
By default the `from_pretrained` method will download the model to the huggingface cache directory so you can manage installed model files with the `huggingface-cli` tool.
285+
228286
### Chat Completion
229287

230288
The high-level API also provides a simple interface for chat completion.

0 commit comments

Comments
0 (0)
Morty Proxy This is a proxified and sanitized view of the page, visit original site.