-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Integrate functionary v1.4 and v2 models + add HF AutoTokenizer as optional parameter in llama.create_completion #1078
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This is great. I'm waiting for this PR. can we merge this soon? @abetlen ? |
llama_cpp/llama.py
Outdated
@@ -1369,11 +1370,24 @@ def _create_completion( | ||
logits_processor: Optional[LogitsProcessorList] = None, | ||
grammar: Optional[LlamaGrammar] = None, | ||
logit_bias: Optional[Dict[str, float]] = None, | ||
hf_tokenizer: Optional[Any] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we introduce a typing.Protocol
for a tokenizer and move this up to the Llama.__init__
as an optional override to the default tokenizer.
**kwargs, # type: ignore | ||
) -> Union[llama_types.ChatCompletion, Iterator[llama_types.ChatCompletionChunk]]: | ||
SYSTEM_MESSAGE = """A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. The assistant calls functions with appropriate input when necessary""" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know the original functionary model is deprecated but I want to breaking any code that's using the old model, can we introduce these chat formats as functionary-v1
and funtionary-v2
for the new models.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the code already checked the model version to use the proper chat template
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that you have a branch that keeps the old v1 code. I will just copy those relevant parts from there into my branch and deduplicate as much code as possible between the new and old chat handlers to preserve backward compatibility.
I'm going to start using this branch directly. Thanks for putting it together! |
README.md
Outdated
The gguf-converted files for this model can be found here: [functionary-7b-v1](https://huggingface.co/abetlen/functionary-7b-v1-GGUF) | ||
The only set of models that supports full function calling at this time is [functionary](https://github.com/MeetKai/functionary). The various gguf-converted files for this set of models can be found [here](https://huggingface.co/meetkai). Functionary is able to intelligently call functions and also analyze any provided function outputs to generate coherent responses. All v2 models of functionary supports **parallel function calling**. | ||
|
||
Note that due to discrepancies between llama.cpp and HuggingFace's tokenizers, it is required to provide the path to the HF tokenizer for functionary. They are already included in the respective HF repositories hosting the gguf files. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about using hf_hub_download
to get the model and to acquire the local file path?
from huggingface_hub import hf_hub_download
# Model repository on the Hugging Face model hub
model_repo = "meetkai/functionary-small-v2.2-GGUF"
# File to download
file_name = "functionary-small-v2.2.f16.gguf"
# Download the file
local_file_path = hf_hub_download(repo_id=model_repo, filename=file_name)
local_file_path
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the good suggestion! However, I think it will be difficult to download the model with huggingface_hub as huggingface_hub is not a dependency in this project. I'm not sure if @abetlen is ok with setting this as a dependency?
It's kinda pointless if we set up hf_hub_download
just for the HF AutoTokenizer since we would have already git cloned the repository to get the model.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good to know. hf_hub_download
does cache the path so for any user that already has it (via transformers, etc.), it won't redownload it.
Update: all set now |
18daeb0
to
4cf8736
Compare
@abetlen I've made the changes. How does this look now? Mainly tokenizer was changed to adopt an ABC-ish setup with |
Hi, may I ask if there is updates on this PR? I think the author has implemented the requested change. Can we merge? 😊 |
@jeffrey-fong looks good, I'm not able to modify the PR but I do have a couple of changes I still need to make, I'll just make them after the merge:
Additionally I'm going to create an issue so that we can fix the |
@abetlen Sure, sounds good to me! I'll keep track of the issue too as it feels really inelegant to have to pass in a HF Tokenizer. I've pulled from main so feel free to approve and merge it. 👍 |
Hey @jeffrey-fong still working on the changes over in my Just wondering if you've seen that before. |
@abetlen seems like it's due to this line in llama.py https://github.com/abetlen/llama-cpp-python/blob/main/llama_cpp/llama.py#L937 The Llama HF Tokenizer does not prepend the prefix space to the first token in the token list when decoding as mentioned here: I have modified the streaming detokenization in the |
@jeffrey-fong is it possible to account for this in the HFTokenizer instead of |
@abetlen sure, I've moved this to both HFTokenizer and LlamaTokenizer. |
@jeffrey-fong thanks for your patience with this PR, I'll review again tonight. Also, in my #957 I was able to write up some helpers to add support for streaming function calls so I should be able to integrate that for functionary as well once it's merged. |
Hello, may I inquire if there's an update on this issue? It appears we're awaiting a PR merge. Am I correct in understanding that? |
Sorry I've just been getting over the flu last few days didn't have a chance to work on this. Looks good, I'll merge and I may still update the tokenizer api before the next release but it should be in and working! Thanks so much for the work on this @jeffrey-fong |
Hope you rested up well @abetlen. Thank you! |
Please rest well, @abetlen. Thank you for working on this. @jeffrey-fong @abetlen |
Closes #1075
Closes #1061