Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Commit 0f09f10

Browse filesBrowse files
committed
add support for llama2 70b
1 parent 4aaaec5 commit 0f09f10
Copy full SHA for 0f09f10

File tree

Expand file treeCollapse file tree

2 files changed

+11
-0
lines changed
Filter options
Expand file treeCollapse file tree

2 files changed

+11
-0
lines changed

‎README.md

Copy file name to clipboardExpand all lines: README.md
+8Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -135,6 +135,14 @@ For instance, if you want to work with larger contexts, you can expand the conte
135135
llm = Llama(model_path="./models/7B/ggml-model.bin", n_ctx=2048)
136136
```
137137

138+
### Loading llama-2 70b
139+
140+
Llama2 70b must set the `n_gqa` parameter (grouped-query attention factor) to 8 when loading:
141+
142+
```python
143+
llm = Llama(model_path="./models/7B/ggml-model.bin", n_gqa=8)
144+
```
145+
138146
## Web Server
139147

140148
`llama-cpp-python` offers a web server which aims to act as a drop-in replacement for the OpenAI API.

‎llama_cpp/llama.py

Copy file name to clipboardExpand all lines: llama_cpp/llama.py
+3Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -216,6 +216,7 @@ def __init__(
216216
embedding: bool = False,
217217
n_threads: Optional[int] = None,
218218
n_batch: int = 512,
219+
n_gqa: Optional[int] = None, # must be 8 for llama2 70b
219220
last_n_tokens_size: int = 64,
220221
lora_base: Optional[str] = None,
221222
lora_path: Optional[str] = None,
@@ -260,6 +261,8 @@ def __init__(
260261

261262
self.params = llama_cpp.llama_context_default_params()
262263
self.params.n_ctx = n_ctx
264+
if n_gqa is not None:
265+
self.params.n_gqa = n_gqa
263266
self.params.n_gpu_layers = n_gpu_layers
264267
self.params.seed = seed
265268
self.params.f16_kv = f16_kv

0 commit comments

Comments
0 (0)
Morty Proxy This is a proxified and sanitized view of the page, visit original site.