Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Integrate functionary v1.4 and v2 models + add HF AutoTokenizer as optional parameter in llama.create_completion #1078

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 33 commits into from
Feb 8, 2024

Conversation

jeffrey-fong
Copy link
Contributor

@jeffrey-fong jeffrey-fong commented Jan 11, 2024

Closes #1075
Closes #1061

  • added Optional parameter to allow tokenizing using HF AutoTokenizer in functionary chat_completion_handler and llama.create_completion
  • Replaced the logic for the obsolete functionary-7b-v1 model in functionary_chat_handler with both functionary v1.4 and v2.* models
  • Implemented parallel function calling feature as v2.* models support it

@sandangel
Copy link

This is great. I'm waiting for this PR. can we merge this soon? @abetlen ?

@@ -1369,11 +1370,24 @@ def _create_completion(
logits_processor: Optional[LogitsProcessorList] = None,
grammar: Optional[LlamaGrammar] = None,
logit_bias: Optional[Dict[str, float]] = None,
hf_tokenizer: Optional[Any] = None,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we introduce a typing.Protocol for a tokenizer and move this up to the Llama.__init__ as an optional override to the default tokenizer.

**kwargs, # type: ignore
) -> Union[llama_types.ChatCompletion, Iterator[llama_types.ChatCompletionChunk]]:
SYSTEM_MESSAGE = """A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. The assistant calls functions with appropriate input when necessary"""

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know the original functionary model is deprecated but I want to breaking any code that's using the old model, can we introduce these chat formats as functionary-v1 and funtionary-v2 for the new models.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the code already checked the model version to use the proper chat template

Copy link
Contributor Author

@jeffrey-fong jeffrey-fong Jan 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that you have a branch that keeps the old v1 code. I will just copy those relevant parts from there into my branch and deduplicate as much code as possible between the new and old chat handlers to preserve backward compatibility.

@rgbkrk
Copy link

rgbkrk commented Jan 22, 2024

I'm going to start using this branch directly. Thanks for putting it together!

README.md Outdated
The gguf-converted files for this model can be found here: [functionary-7b-v1](https://huggingface.co/abetlen/functionary-7b-v1-GGUF)
The only set of models that supports full function calling at this time is [functionary](https://github.com/MeetKai/functionary). The various gguf-converted files for this set of models can be found [here](https://huggingface.co/meetkai). Functionary is able to intelligently call functions and also analyze any provided function outputs to generate coherent responses. All v2 models of functionary supports **parallel function calling**.

Note that due to discrepancies between llama.cpp and HuggingFace's tokenizers, it is required to provide the path to the HF tokenizer for functionary. They are already included in the respective HF repositories hosting the gguf files.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about using hf_hub_download to get the model and to acquire the local file path?

from huggingface_hub import hf_hub_download

# Model repository on the Hugging Face model hub
model_repo = "meetkai/functionary-small-v2.2-GGUF"

# File to download
file_name = "functionary-small-v2.2.f16.gguf"

# Download the file
local_file_path = hf_hub_download(repo_id=model_repo, filename=file_name)
local_file_path

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the good suggestion! However, I think it will be difficult to download the model with huggingface_hub as huggingface_hub is not a dependency in this project. I'm not sure if @abetlen is ok with setting this as a dependency?

It's kinda pointless if we set up hf_hub_download just for the HF AutoTokenizer since we would have already git cloned the repository to get the model.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to know. hf_hub_download does cache the path so for any user that already has it (via transformers, etc.), it won't redownload it.

@rgbkrk
Copy link

rgbkrk commented Jan 22, 2024

On this branch I get a segfault on launch. On main I don't. This is what I run:

python -m llama_cpp.server --model $(python -c "from huggingface_hub import hf_hub_download; print(hf_hub_download(repo_id='meetkai/functionary-small-v2.2-GGUF', filename='functionary-small-v2.2.f16.gguf'))")  --chat_format functionary

That's no longer an issue if the branch is rebased against main though.

However, after that's done, we (possibly) need a default set for the hf_tokenizer_path when used in server mode.

For now I've put working changes into https://github.com/rgbkrk/llama-cpp-python/tree/integrate-functionary-rebased

Update: all set now

@jeffrey-fong jeffrey-fong force-pushed the integrate-functionary branch from 18daeb0 to 4cf8736 Compare January 23, 2024 02:23
@jeffrey-fong
Copy link
Contributor Author

@abetlen I've made the changes. How does this look now? Mainly tokenizer was changed to adopt an ABC-ish setup with typing.Protocol as suggested by you. The documentation page for the OpenAI server may need to be updated since I've added hf_tokenizer_path into ModelSettings too.

@sandangel
Copy link

Hi, may I ask if there is updates on this PR? I think the author has implemented the requested change. Can we merge? 😊

@abetlen
Copy link
Owner

abetlen commented Jan 30, 2024

@jeffrey-fong looks good, I'm not able to modify the PR but I do have a couple of changes I still need to make, I'll just make them after the merge:

  • Instead of hf_tokenizer_path in the __init__ I'm going to change this to an optional tokenizer param so you can pass an HFTokenizer to the Llama class. I'll keep the hf_tokenizer_path option for the server though.
  • I'm going to revert the tokenizer signature and adjust the surrounding code otherwise this might break some existing code expecting the old signature.

Additionally I'm going to create an issue so that we can fix the llama.cpp tokenizer problems that are preventing us from using the normal tokenizer as-is.

@jeffrey-fong
Copy link
Contributor Author

@abetlen Sure, sounds good to me! I'll keep track of the issue too as it feels really inelegant to have to pass in a HF Tokenizer.

I've pulled from main so feel free to approve and merge it. 👍

@abetlen
Copy link
Owner

abetlen commented Jan 31, 2024

Hey @jeffrey-fong still working on the changes over in my origin/integrate-functionary branch, not sure if I screwed something up but when streaming normally the tokenizer seems to ignore any spaces.

image

Just wondering if you've seen that before.

@jeffrey-fong
Copy link
Contributor Author

@abetlen seems like it's due to this line in llama.py https://github.com/abetlen/llama-cpp-python/blob/main/llama_cpp/llama.py#L937

The Llama HF Tokenizer does not prepend the prefix space to the first token in the token list when decoding as mentioned here:
image
https://huggingface.co/docs/transformers/main/model_doc/llama

I have modified the streaming detokenization in the _create_completion method with offset mapping logic when the tokenizer is LlamaHFTokenizer. Hope it does not break any existing code.

@abetlen
Copy link
Owner

abetlen commented Feb 1, 2024

@jeffrey-fong is it possible to account for this in the HFTokenizer instead of _create_completion?

@jeffrey-fong
Copy link
Contributor Author

@abetlen sure, I've moved this to both HFTokenizer and LlamaTokenizer. Llama.detokenize will have a new Optional parameter called prev_tokens. If prev_tokens is not None, we will perform offset mapping in the decoded string level for HFTokenizer and tokens level for LlamaTokenizer. Hope this new optional parameter is ok for you.

@abetlen
Copy link
Owner

abetlen commented Feb 2, 2024

@jeffrey-fong thanks for your patience with this PR, I'll review again tonight.

Also, in my #957 I was able to write up some helpers to add support for streaming function calls so I should be able to integrate that for functionary as well once it's merged.

@sandangel
Copy link

Hello, may I inquire if there's an update on this issue? It appears we're awaiting a PR merge. Am I correct in understanding that?

@abetlen
Copy link
Owner

abetlen commented Feb 8, 2024

Sorry I've just been getting over the flu last few days didn't have a chance to work on this. Looks good, I'll merge and I may still update the tokenizer api before the next release but it should be in and working!

Thanks so much for the work on this @jeffrey-fong

@abetlen abetlen merged commit 9018270 into abetlen:main Feb 8, 2024
@rgbkrk
Copy link

rgbkrk commented Feb 8, 2024

Hope you rested up well @abetlen. Thank you!

@sandangel
Copy link

Please rest well, @abetlen. Thank you for working on this. @jeffrey-fong @abetlen

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Refactor Functionary chat_handler to use HF AutoTokenizer functionary-7b-v1 model upgrade
4 participants
Morty Proxy This is a proxified and sanitized view of the page, visit original site.