File tree Expand file tree Collapse file tree 2 files changed +13
-2
lines changed Open diff view settings
Expand file tree Collapse file tree 2 files changed +13
-2
lines changed Open diff view settings
Original file line number Diff line number Diff line change @@ -286,7 +286,16 @@ By default [`from_pretrained`](https://llama-cpp-python.readthedocs.io/en/latest
286286
287287The high-level API also provides a simple interface for chat completion.
288288
289- Note that ` chat_format ` option must be set for the particular model you are using.
289+ Chat completion requires that the model know how to format the messages into a single prompt.
290+ The ` Llama ` class does this using pre-registered chat formats (ie. ` chatml ` , ` llama-2 ` , ` gemma ` , etc) or by providing a custom chat handler object.
291+
292+ The model will will format the messages into a single prompt using the following order of precedence:
293+ - Use the ` chat_handler ` if provided
294+ - Use the ` chat_format ` if provided
295+ - Use the ` tokenizer.chat_template ` from the ` gguf ` model's metadata (should work for most new models, older models may not have this)
296+ - else, fallback to the ` llama-2 ` chat format
297+
298+ Set ` verbose=True ` to see the selected chat format.
290299
291300``` python
292301>> > from llama_cpp import Llama
Original file line number Diff line number Diff line change @@ -410,7 +410,7 @@ def __init__(
410410 bos_token = self ._model .token_get_text (bos_token_id )
411411
412412 if self .verbose :
413- print (f"Using chat template: { template } " , file = sys .stderr )
413+ print (f"Using gguf chat template: { template } " , file = sys .stderr )
414414 print (f"Using chat eos_token: { eos_token } " , file = sys .stderr )
415415 print (f"Using chat bos_token: { bos_token } " , file = sys .stderr )
416416
@@ -420,6 +420,8 @@ def __init__(
420420
421421 if self .chat_format is None and self .chat_handler is None :
422422 self .chat_format = "llama-2"
423+ if self .verbose :
424+ print (f"Using fallback chat format: { chat_format } " , file = sys .stderr )
423425
424426 @property
425427 def ctx (self ) -> llama_cpp .llama_context_p :
You can’t perform that action at this time.
0 commit comments