Integrate functionary v1.4 and v2 models + add HF AutoTokenizer as optional parameter in llama.create_completion #1078

jeffrey-fong · Jan 11, 2024

Closes #1075
Closes #1061

added Optional parameter to allow tokenizing using HF AutoTokenizer in functionary chat_completion_handler and llama.create_completion
Replaced the logic for the obsolete functionary-7b-v1 model in functionary_chat_handler with both functionary v1.4 and v2.* models
Implemented parallel function calling feature as v2.* models support it

sandangel · Jan 20, 2024

This is great. I'm waiting for this PR. can we merge this soon? @abetlen ?

abetlen · Jan 22, 2024

llama_cpp/llama.py

@@ -1369,11 +1370,24 @@ def _create_completion(
        logits_processor: Optional[LogitsProcessorList] = None,
        grammar: Optional[LlamaGrammar] = None,
        logit_bias: Optional[Dict[str, float]] = None,
+        hf_tokenizer: Optional[Any] = None,


Can we introduce a typing.Protocol for a tokenizer and move this up to the Llama.__init__ as an optional override to the default tokenizer.

abetlen · Jan 22, 2024

llama_cpp/llama_chat_format.py

    **kwargs,  # type: ignore
 ) -> Union[llama_types.ChatCompletion, Iterator[llama_types.ChatCompletionChunk]]:
    SYSTEM_MESSAGE = """A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. The assistant calls functions with appropriate input when necessary"""
+


I know the original functionary model is deprecated but I want to breaking any code that's using the old model, can we introduce these chat formats as functionary-v1 and funtionary-v2 for the new models.

I think the code already checked the model version to use the proper chat template

I see that you have a branch that keeps the old v1 code. I will just copy those relevant parts from there into my branch and deduplicate as much code as possible between the new and old chat handlers to preserve backward compatibility.

rgbkrk · Jan 22, 2024

I'm going to start using this branch directly. Thanks for putting it together!

rgbkrk · Jan 22, 2024

README.md

-The gguf-converted files for this model can be found here: [functionary-7b-v1](https://huggingface.co/abetlen/functionary-7b-v1-GGUF)
+The only set of models that supports full function calling at this time is [functionary](https://github.com/MeetKai/functionary). The various gguf-converted files for this set of models can be found [here](https://huggingface.co/meetkai). Functionary is able to intelligently call functions and also analyze any provided function outputs to generate coherent responses. All v2 models of functionary supports **parallel function calling**.
+
+Note that due to discrepancies between llama.cpp and HuggingFace's tokenizers, it is required to provide the path to the HF tokenizer for functionary. They are already included in the respective HF repositories hosting the gguf files.


What do you think about using hf_hub_download to get the model and to acquire the local file path?

from huggingface_hub import hf_hub_download # Model repository on the Hugging Face model hub model_repo = "meetkai/functionary-small-v2.2-GGUF" # File to download file_name = "functionary-small-v2.2.f16.gguf" # Download the file local_file_path = hf_hub_download(repo_id=model_repo, filename=file_name) local_file_path

Thanks for the good suggestion! However, I think it will be difficult to download the model with huggingface_hub as huggingface_hub is not a dependency in this project. I'm not sure if @abetlen is ok with setting this as a dependency?

It's kinda pointless if we set up hf_hub_download just for the HF AutoTokenizer since we would have already git cloned the repository to get the model.

Good to know. hf_hub_download does cache the path so for any user that already has it (via transformers, etc.), it won't redownload it.

rgbkrk · Jan 22, 2024

~~On this branch I get a segfault on launch. On main I don't. This is what I run:~~

python -m llama_cpp.server --model $(python -c "from huggingface_hub import hf_hub_download; print(hf_hub_download(repo_id='meetkai/functionary-small-v2.2-GGUF', filename='functionary-small-v2.2.f16.gguf'))")  --chat_format functionary

~~That's no longer an issue if the branch is rebased against main though.~~

~~However, after that's done, we (possibly) need a default set for the hf_tokenizer_path when used in server mode.~~

~~For now I've put working changes into https://github.com/rgbkrk/llama-cpp-python/tree/integrate-functionary-rebased~~

Update: all set now

jeffrey-fong · Jan 23, 2024

@abetlen I've made the changes. How does this look now? Mainly tokenizer was changed to adopt an ABC-ish setup with typing.Protocol as suggested by you. The documentation page for the OpenAI server may need to be updated since I've added hf_tokenizer_path into ModelSettings too.

sandangel · Jan 29, 2024

Hi, may I ask if there is updates on this PR? I think the author has implemented the requested change. Can we merge? 😊

abetlen · Jan 30, 2024

@jeffrey-fong looks good, I'm not able to modify the PR but I do have a couple of changes I still need to make, I'll just make them after the merge:

Instead of hf_tokenizer_path in the __init__ I'm going to change this to an optional tokenizer param so you can pass an HFTokenizer to the Llama class. I'll keep the hf_tokenizer_path option for the server though.
I'm going to revert the tokenizer signature and adjust the surrounding code otherwise this might break some existing code expecting the old signature.

Additionally I'm going to create an issue so that we can fix the llama.cpp tokenizer problems that are preventing us from using the normal tokenizer as-is.

jeffrey-fong · Jan 30, 2024

@abetlen Sure, sounds good to me! I'll keep track of the issue too as it feels really inelegant to have to pass in a HF Tokenizer.

I've pulled from main so feel free to approve and merge it. 👍

abetlen · Jan 31, 2024

Hey @jeffrey-fong still working on the changes over in my origin/integrate-functionary branch, not sure if I screwed something up but when streaming normally the tokenizer seems to ignore any spaces.

Just wondering if you've seen that before.

jeffrey-fong · Feb 1, 2024

@abetlen seems like it's due to this line in llama.py https://github.com/abetlen/llama-cpp-python/blob/main/llama_cpp/llama.py#L937

The Llama HF Tokenizer does not prepend the prefix space to the first token in the token list when decoding as mentioned here:

https://huggingface.co/docs/transformers/main/model_doc/llama

I have modified the streaming detokenization in the _create_completion method with offset mapping logic when the tokenizer is LlamaHFTokenizer. Hope it does not break any existing code.

abetlen · Feb 1, 2024

@jeffrey-fong is it possible to account for this in the HFTokenizer instead of _create_completion?

jeffrey-fong · Feb 2, 2024

@abetlen sure, I've moved this to both HFTokenizer and LlamaTokenizer. Llama.detokenize will have a new Optional parameter called prev_tokens. If prev_tokens is not None, we will perform offset mapping in the decoded string level for HFTokenizer and tokens level for LlamaTokenizer. Hope this new optional parameter is ok for you.

abetlen · Feb 2, 2024

@jeffrey-fong thanks for your patience with this PR, I'll review again tonight.

Also, in my #957 I was able to write up some helpers to add support for streaming function calls so I should be able to integrate that for functionary as well once it's merged.

sandangel · Feb 7, 2024

Hello, may I inquire if there's an update on this issue? It appears we're awaiting a PR merge. Am I correct in understanding that?

abetlen · Feb 8, 2024

Sorry I've just been getting over the flu last few days didn't have a chance to work on this. Looks good, I'll merge and I may still update the tokenizer api before the next release but it should be in and working!

Thanks so much for the work on this @jeffrey-fong

rgbkrk · Feb 8, 2024

Hope you rested up well @abetlen. Thank you!

sandangel · Feb 8, 2024

Please rest well, @abetlen. Thank you for working on this. @jeffrey-fong @abetlen

abetlen requested changes Jan 22, 2024

View reviewed changes

rgbkrk reviewed Jan 22, 2024

View reviewed changes

jeffrey-fong added 8 commits January 23, 2024 02:21

convert functionary-v1 chat handler to use hf autotokenizer

5903e2f

add hf_tokenizer + inteegrate functionary-v1.4 prompt template

485f129

integrate functionary v2 prompt template

9580bee

update readme

bb48a83

set up parallel function calling wip

0369931

set up parallel function calling

9540df9

Update README.md

c71863c

Update README.md

4cf8736

jeffrey-fong force-pushed the integrate-functionary branch from 18daeb0 to 4cf8736 Compare January 23, 2024 02:23

jeffrey-fong added 3 commits January 23, 2024 09:53

refactor tokenizers

ae7009b

include old functionary handler for backward compatibility

ebb4ec0

add hf_tokenizer_path in server ModelSettings

9594d5c

Merge branch 'main' into integrate-functionary

7ea2e6e

jeffrey-fong added 8 commits January 31, 2024 08:32

convert functionary-v1 chat handler to use hf autotokenizer

43b4529

add hf_tokenizer + inteegrate functionary-v1.4 prompt template

c9c6947

integrate functionary v2 prompt template

f912c62

update readme

3b5fe39

set up parallel function calling wip

4dd6b62

resolve merge conflict

03b68fe

Update README.md

2957baf

Update README.md

7a98b04

jeffrey-fong and others added 8 commits January 31, 2024 08:35

refactor tokenizers

bc9447b

include old functionary handler for backward compatibility

8d334df

add hf_tokenizer_path in server ModelSettings

8d08b2d

resolve merge conflict

5647bea

Merge branch 'main' into integrate-functionary

951a6c9

Merge branch 'main' into integrate-functionary

1825688

Cleanup PR, fix breaking changes

3657cba

Use hf_pretrained_model_name_or_path for tokenizer

a79743b

fix hf tokenizer in streaming

7b36eb3

update README

5ea9b19

jeffrey-fong added 2 commits February 2, 2024 01:49

pull from main

c8b5257

refactor offset mapping

24eb0db

jeffreymeetkai mentioned this pull request Feb 2, 2024

Challenges with llama.cpp MeetKai/functionary#77

Closed

Merge branch 'main' into integrate-functionary

28c401c

abetlen approved these changes Feb 8, 2024

View reviewed changes

abetlen merged commit 9018270 into abetlen:main Feb 8, 2024

kahagerman mentioned this pull request Feb 14, 2024

Fix functionary v2 integration #1181

Closed

axel7083 mentioned this pull request Aug 29, 2024

Allow python packages to contribute to LlamaChatCompletionHandlerRegistry #1715

Open

Search code, repositories, users, issues, pull requests...

Integrate functionary v1.4 and v2 models + add HF AutoTokenizer as optional parameter in llama.create_completion #1078

Integrate functionary v1.4 and v2 models + add HF AutoTokenizer as optional parameter in llama.create_completion #1078

Uh oh!

Conversation

jeffrey-fong commented Jan 11, 2024 • edited by abetlen Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sandangel commented Jan 20, 2024

Uh oh!

abetlen Jan 22, 2024

Choose a reason for hiding this comment

Uh oh!

abetlen Jan 22, 2024

Choose a reason for hiding this comment

Uh oh!

sandangel Jan 23, 2024

Choose a reason for hiding this comment

Uh oh!

jeffrey-fong Jan 23, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rgbkrk commented Jan 22, 2024

Uh oh!

rgbkrk Jan 22, 2024

Choose a reason for hiding this comment

Uh oh!

jeffrey-fong Jan 23, 2024

Choose a reason for hiding this comment

Uh oh!

rgbkrk Jan 25, 2024

Choose a reason for hiding this comment

Uh oh!

rgbkrk commented Jan 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jeffrey-fong commented Jan 23, 2024

Uh oh!

sandangel commented Jan 29, 2024

Uh oh!

abetlen commented Jan 30, 2024

Uh oh!

jeffrey-fong commented Jan 30, 2024

Uh oh!

abetlen commented Jan 31, 2024

Uh oh!

jeffrey-fong commented Feb 1, 2024

Uh oh!

abetlen commented Feb 1, 2024

Uh oh!

jeffrey-fong commented Feb 2, 2024

Uh oh!

abetlen commented Feb 2, 2024

Uh oh!

sandangel commented Feb 7, 2024

Uh oh!

abetlen commented Feb 8, 2024

Uh oh!

rgbkrk commented Feb 8, 2024

Uh oh!

sandangel commented Feb 8, 2024

Uh oh!

Uh oh!

jeffrey-fong commented Jan 11, 2024 •

edited by abetlen

Loading

jeffrey-fong Jan 23, 2024 •

edited

Loading

rgbkrk commented Jan 22, 2024 •

edited

Loading