Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Commit 97aa3a1

Browse filesBrowse files
committed
docs: Add information re: auto chat formats. Closes abetlen#1236
1 parent f062a7f commit 97aa3a1
Copy full SHA for 97aa3a1

File tree

Expand file treeCollapse file tree

2 files changed

+13
-2
lines changed
Filter options
Expand file treeCollapse file tree

2 files changed

+13
-2
lines changed

‎README.md

Copy file name to clipboardExpand all lines: README.md
+10-1Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -286,7 +286,16 @@ By default [`from_pretrained`](https://llama-cpp-python.readthedocs.io/en/latest
286286

287287
The high-level API also provides a simple interface for chat completion.
288288

289-
Note that `chat_format` option must be set for the particular model you are using.
289+
Chat completion requires that the model know how to format the messages into a single prompt.
290+
The `Llama` class does this using pre-registered chat formats (ie. `chatml`, `llama-2`, `gemma`, etc) or by providing a custom chat handler object.
291+
292+
The model will will format the messages into a single prompt using the following order of precedence:
293+
- Use the `chat_handler` if provided
294+
- Use the `chat_format` if provided
295+
- Use the `tokenizer.chat_template` from the `gguf` model's metadata (should work for most new models, older models may not have this)
296+
- else, fallback to the `llama-2` chat format
297+
298+
Set `verbose=True` to see the selected chat format.
290299

291300
```python
292301
>>> from llama_cpp import Llama

‎llama_cpp/llama.py

Copy file name to clipboardExpand all lines: llama_cpp/llama.py
+3-1Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -410,7 +410,7 @@ def __init__(
410410
bos_token = self._model.token_get_text(bos_token_id)
411411

412412
if self.verbose:
413-
print(f"Using chat template: {template}", file=sys.stderr)
413+
print(f"Using gguf chat template: {template}", file=sys.stderr)
414414
print(f"Using chat eos_token: {eos_token}", file=sys.stderr)
415415
print(f"Using chat bos_token: {bos_token}", file=sys.stderr)
416416

@@ -420,6 +420,8 @@ def __init__(
420420

421421
if self.chat_format is None and self.chat_handler is None:
422422
self.chat_format = "llama-2"
423+
if self.verbose:
424+
print(f"Using fallback chat format: {chat_format}", file=sys.stderr)
423425

424426
@property
425427
def ctx(self) -> llama_cpp.llama_context_p:

0 commit comments

Comments
0 (0)
Morty Proxy This is a proxified and sanitized view of the page, visit original site.