-
Notifications
You must be signed in to change notification settings - Fork 11.7k
fix wrong template in GLM4-0414 #13140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -5154,7 +5154,7 @@ def set_vocab(self): | |||
special_vocab._set_special_token("eos", tokenizer.get_added_vocab()["<|endoftext|>"]) | |||
special_vocab._set_special_token("eot", tokenizer.get_added_vocab()["<|user|>"]) | |||
special_vocab._set_special_token("unk", tokenizer.get_added_vocab()["<|endoftext|>"]) | |||
special_vocab._set_special_token("bos", tokenizer.get_added_vocab()["[gMASK]"]) | |||
special_vocab._set_special_token("bos", tokenizer.get_added_vocab()["<|endoftext|>"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I can do that.
- add_bos_token is already set to False by the HF model and conversion script.
- The jinja templating system always removes the bos token, ignoring add_bos_token. So the prefix is removed and never added again.
- If I remove the bos token altogether, the /props endpoint crashes because it expects it
This is the relevant jinja template code. reached when ./llama-server --jinja
Line 651 in e291450
if (string_starts_with(result, tmpl.bos_token())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the BOS is not added, then what is the reason behind changing it from [gMASK]
to <|endoftext|>
? it shouldn't crash /props
, right?
Nvm I see what you mean
Do GGUFs have to be regenerated for these fixes to apply, or can I continue using the GGUF I already have? |
The GGUF should be regenerated or edited manually. |
* fix wrong template in GLM4-0414 * fix spaces * no bos token since it is already in the template * moved the chatgml4 check to higher priority * restored template for old GLM models * moved the GLM4 template check in the correct place with correct check
Hello! Sorry, noob here. What changes would I have to make to my existing GGUF? I've looked at the template and I'm just confused. |
you only have to replace the bos token kv parameter from |
kv parameter? Do you mean edit the bos token ID in the metadata? or somewhere else. And I assume this means I was testing the model with bos token listed twice, even in text completion? I wish you could simply choose to add the bos or not in text completions from the front end. I'm under the impression that it's added anyway. |
The command to enable/disable the bos token is The problem is that the jinja templating system ignores |
Reopening #13099 because I broke my repo with a wrong git command and it automatically closed the PR.
@ngxson
I am not sure if I can do that. add_bos_token is automatically set to false by the HF model. I am forced to set it to PAD token because otherwise it would break the jinja templating system
./llama-server --jinja
.The jinja code ignores the add_bos_token value and removes any BOS token it encounters.
This is the relevant code block.
llama.cpp/common/chat.cpp
Line 651 in e291450
If I don't set the BOS token it would break the /props endpoint
ORIGINAL: