Closed
Description
What happened?
When trying to convert
https://huggingface.co/SakanaAI/DiscoPOP-zephyr-7b-gemma/
I get the error in the title, but it's only defined a single time in tokenizer_config.json:
https://huggingface.co/SakanaAI/DiscoPOP-zephyr-7b-gemma/blob/main/tokenizer_config.json#L59
Verified locally with cat *.json | grep chat_template
and I only get the one result
Is it somehow trying to load it twice?
Looks like when Gemma is initialized, it runs _set_vocab_sentencepiece(), which runs special_vocab.add_to_gguf (which pulls in the chat_template), and then it also again runs special_vocab.add_to_gguf
but that would mean it's been broken since April 16..
Name and Version
b3145 ubuntu 22.04
What operating system are you seeing the problem on?
Linux
Relevant log output
INFO:hf-to-gguf:Loading model: DiscoPOP-zephyr-7b-gemma
INFO:gguf.gguf_writer:gguf: This GGUF file is for Little Endian only
INFO:hf-to-gguf:Set model parameters
INFO:hf-to-gguf:Set model tokenizer
INFO:gguf.vocab:Setting special token type bos to 2
INFO:gguf.vocab:Setting special token type eos to 1
INFO:gguf.vocab:Setting special token type unk to 3
INFO:gguf.vocab:Setting special token type pad to 0
INFO:gguf.vocab:Setting add_bos_token to False
INFO:gguf.vocab:Setting add_eos_token to False
INFO:gguf.vocab:Setting chat_template to {% if messages[0]['role'] == 'user' or messages[0]['role'] == 'system' %}{{ bos_token }}{% endif %}{% for message in messages %}{{ '<|im_start|>' + message['role'] + '
' + message['content'] + '<|im_end|>' + '
' }}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant
' }}{% elif messages[-1]['role'] == 'assistant' %}{{ eos_token }}{% endif %}
INFO:gguf.vocab:Setting special token type prefix to 67
INFO:gguf.vocab:Setting special token type suffix to 69
INFO:gguf.vocab:Setting special token type middle to 68
WARNING:gguf.vocab:No handler for special token type fsep with id 70 - skipping
INFO:gguf.vocab:Setting special token type eot to 107
INFO:gguf.vocab:Setting chat_template to {% if messages[0]['role'] == 'user' or messages[0]['role'] == 'system' %}{{ bos_token }}{% endif %}{% for message in messages %}{{ '<|im_start|>' + message['role'] + '
' + message['content'] + '<|im_end|>' + '
' }}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant
' }}{% elif messages[-1]['role'] == 'assistant' %}{{ eos_token }}{% endif %}
Traceback (most recent call last):
File "/llama.cpp/convert-hf-to-gguf.py", line 2882, in <module>
main()
File "/llama.cpp/convert-hf-to-gguf.py", line 2867, in main
model_instance.set_vocab()
File "/llama.cpp/convert-hf-to-gguf.py", line 2251, in set_vocab
special_vocab.add_to_gguf(self.gguf_writer)
File "/llama.cpp/gguf-py/gguf/vocab.py", line 73, in add_to_gguf
gw.add_chat_template(self.chat_template)
File "/llama.cpp/gguf-py/gguf/gguf_writer.py", line 565, in add_chat_template
self.add_string(Keys.Tokenizer.CHAT_TEMPLATE, value)
File "/llama.cpp/gguf-py/gguf/gguf_writer.py", line 206, in add_string
self.add_key_value(key, val, GGUFValueType.STRING)
File "/llama.cpp/gguf-py/gguf/gguf_writer.py", line 166, in add_key_value
raise ValueError(f'Duplicated key name {key!r}')
ValueError: Duplicated key name 'tokenizer.chat_template'
stepancheg
Metadata
Metadata
Assignees
Labels
Something isn't workingSomething isn't workingUsed to report low severity bugs in llama.cpp (e.g. cosmetic issues, non critical UI glitches)Used to report low severity bugs in llama.cpp (e.g. cosmetic issues, non critical UI glitches)