File tree Expand file tree Collapse file tree 2 files changed +13
-2
lines changed
Filter options
Expand file tree Collapse file tree 2 files changed +13
-2
lines changed
Original file line number Diff line number Diff line change @@ -286,7 +286,16 @@ By default [`from_pretrained`](https://llama-cpp-python.readthedocs.io/en/latest
286
286
287
287
The high-level API also provides a simple interface for chat completion.
288
288
289
- Note that ` chat_format ` option must be set for the particular model you are using.
289
+ Chat completion requires that the model know how to format the messages into a single prompt.
290
+ The ` Llama ` class does this using pre-registered chat formats (ie. ` chatml ` , ` llama-2 ` , ` gemma ` , etc) or by providing a custom chat handler object.
291
+
292
+ The model will will format the messages into a single prompt using the following order of precedence:
293
+ - Use the ` chat_handler ` if provided
294
+ - Use the ` chat_format ` if provided
295
+ - Use the ` tokenizer.chat_template ` from the ` gguf ` model's metadata (should work for most new models, older models may not have this)
296
+ - else, fallback to the ` llama-2 ` chat format
297
+
298
+ Set ` verbose=True ` to see the selected chat format.
290
299
291
300
``` python
292
301
>> > from llama_cpp import Llama
Original file line number Diff line number Diff line change @@ -410,7 +410,7 @@ def __init__(
410
410
bos_token = self ._model .token_get_text (bos_token_id )
411
411
412
412
if self .verbose :
413
- print (f"Using chat template: { template } " , file = sys .stderr )
413
+ print (f"Using gguf chat template: { template } " , file = sys .stderr )
414
414
print (f"Using chat eos_token: { eos_token } " , file = sys .stderr )
415
415
print (f"Using chat bos_token: { bos_token } " , file = sys .stderr )
416
416
@@ -420,6 +420,8 @@ def __init__(
420
420
421
421
if self .chat_format is None and self .chat_handler is None :
422
422
self .chat_format = "llama-2"
423
+ if self .verbose :
424
+ print (f"Using fallback chat format: { chat_format } " , file = sys .stderr )
423
425
424
426
@property
425
427
def ctx (self ) -> llama_cpp .llama_context_p :
You can’t perform that action at this time.
0 commit comments