-
Notifications
You must be signed in to change notification settings - Fork 12.1k
simple-chat : only add bos on first prompt #10129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
It seems that the |
However, maybe it's better to refactor llama-vocab because UPD: the issue with EOS happens in |
Should check if |
Is there any case where |
I was about to say that |
I think it kind of makes sense for embedding models.
No, if the model needs EOS in responses, it will generate it. So this option is not useful for chat use cases. |
Was there at any point (back in llama2 days or even earlier) problem with models not generating EOS tokens? There's a guard for antiprompt generated by the model in main. |
If this only makes sense for embeddings models, wouldn't this make the way most examples use it wrong? I took a quick look, and almost every case still treats this as an |
I wouldn't be surprised if there are incorrect usages of the
So with this logic in mind, the current implementation in this branch seems OK. Is it failing with a certain model that you tried? If yes, what are the 2 KV values set to? |
No, the responses generated look fine. I missed that it is only added if |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I also forgot about this initially and figured it out while looking at the usages. Probably the example can print a warning if the provided model has llama_add_eos_token(model) == true
as we don't expect to use such models with this example.
Fix bos being added on all calls to
llama_tokenize
.