Bug: convert-hf-to-gguf.py on Gemma model ValueError: Duplicated key name 'tokenizer.chat_template'

What happened?

When trying to convert

https://huggingface.co/SakanaAI/DiscoPOP-zephyr-7b-gemma/

I get the error in the title, but it's only defined a single time in tokenizer_config.json:

https://huggingface.co/SakanaAI/DiscoPOP-zephyr-7b-gemma/blob/main/tokenizer_config.json#L59

Verified locally with cat *.json | grep chat_template and I only get the one result

Is it somehow trying to load it twice?

Looks like when Gemma is initialized, it runs _set_vocab_sentencepiece(), which runs special_vocab.add_to_gguf (which pulls in the chat_template), and then it also again runs special_vocab.add_to_gguf

but that would mean it's been broken since April 16..

#6689

Name and Version

b3145 ubuntu 22.04

What operating system are you seeing the problem on?

Linux

Relevant log output

INFO:hf-to-gguf:Loading model: DiscoPOP-zephyr-7b-gemma
INFO:gguf.gguf_writer:gguf: This GGUF file is for Little Endian only
INFO:hf-to-gguf:Set model parameters
INFO:hf-to-gguf:Set model tokenizer
INFO:gguf.vocab:Setting special token type bos to 2
INFO:gguf.vocab:Setting special token type eos to 1
INFO:gguf.vocab:Setting special token type unk to 3
INFO:gguf.vocab:Setting special token type pad to 0
INFO:gguf.vocab:Setting add_bos_token to False
INFO:gguf.vocab:Setting add_eos_token to False
INFO:gguf.vocab:Setting chat_template to {% if messages[0]['role'] == 'user' or messages[0]['role'] == 'system' %}{{ bos_token }}{% endif %}{% for message in messages %}{{ '<|im_start|>' + message['role'] + '
' + message['content'] + '<|im_end|>' + '
' }}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant
' }}{% elif messages[-1]['role'] == 'assistant' %}{{ eos_token }}{% endif %}
INFO:gguf.vocab:Setting special token type prefix to 67
INFO:gguf.vocab:Setting special token type suffix to 69
INFO:gguf.vocab:Setting special token type middle to 68
WARNING:gguf.vocab:No handler for special token type fsep with id 70 - skipping
INFO:gguf.vocab:Setting special token type eot to 107
INFO:gguf.vocab:Setting chat_template to {% if messages[0]['role'] == 'user' or messages[0]['role'] == 'system' %}{{ bos_token }}{% endif %}{% for message in messages %}{{ '<|im_start|>' + message['role'] + '
' + message['content'] + '<|im_end|>' + '
' }}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant
' }}{% elif messages[-1]['role'] == 'assistant' %}{{ eos_token }}{% endif %}
Traceback (most recent call last):
  File "/llama.cpp/convert-hf-to-gguf.py", line 2882, in <module>
    main()
  File "/llama.cpp/convert-hf-to-gguf.py", line 2867, in main
    model_instance.set_vocab()
  File "/llama.cpp/convert-hf-to-gguf.py", line 2251, in set_vocab
    special_vocab.add_to_gguf(self.gguf_writer)
  File "/llama.cpp/gguf-py/gguf/vocab.py", line 73, in add_to_gguf
    gw.add_chat_template(self.chat_template)
  File "/llama.cpp/gguf-py/gguf/gguf_writer.py", line 565, in add_chat_template
    self.add_string(Keys.Tokenizer.CHAT_TEMPLATE, value)
  File "/llama.cpp/gguf-py/gguf/gguf_writer.py", line 206, in add_string
    self.add_key_value(key, val, GGUFValueType.STRING)
  File "/llama.cpp/gguf-py/gguf/gguf_writer.py", line 166, in add_key_value
    raise ValueError(f'Duplicated key name {key!r}')
ValueError: Duplicated key name 'tokenizer.chat_template'

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Bug: convert-hf-to-gguf.py on Gemma model ValueError: Duplicated key name 'tokenizer.chat_template' #7923

What happened?

Name and Version

What operating system are you seeing the problem on?

Relevant log output

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Search code, repositories, users, issues, pull requests...

Bug: convert-hf-to-gguf.py on Gemma model ValueError: Duplicated key name 'tokenizer.chat_template' #7923

Description

What happened?

Name and Version

What operating system are you seeing the problem on?

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions