fix wrong template in GLM4-0414 #13140

matteoserva · 2025-04-27T18:24:09Z

Reopening #13099 because I broke my repo with a wrong git command and it automatically closed the PR.

I moved the check for GLM4 above so it gets higher priority.

Instead of changing BOS, I think what should be done is to set add_add_bos(False) to prevent tokenize function from adding BOS to the sentence

I am not sure if I can do that. add_bos_token is automatically set to false by the HF model. I am forced to set it to PAD token because otherwise it would break the jinja templating system ./llama-server --jinja.
The jinja code ignores the add_bos_token value and removes any BOS token it encounters.

This is the relevant code block.

llama.cpp/common/chat.cpp

Line 651 in e291450

if (string_starts_with(result, tmpl.bos_token())) {

If I don't set the BOS token it would break the /props endpoint

ORIGINAL:

GLM4-0414 models were using the wrong legacy template, leading to a missing `[gMASK]<sop>` preamble.
The old code was returning `LLM_CHAT_TEMPLATE_GLMEDGE`

As a workaround you needed to launch `llama-server` with `--chat-template chatglm4`

After the patch the gguf should be regenerated or edited manually.

src/llama-chat.cpp

ngxson · 2025-04-27T18:46:07Z

convert_hf_to_gguf.py

@@ -5154,7 +5154,7 @@ def set_vocab(self):
        special_vocab._set_special_token("eos", tokenizer.get_added_vocab()["<|endoftext|>"])
        special_vocab._set_special_token("eot", tokenizer.get_added_vocab()["<|user|>"])
        special_vocab._set_special_token("unk", tokenizer.get_added_vocab()["<|endoftext|>"])
-        special_vocab._set_special_token("bos", tokenizer.get_added_vocab()["[gMASK]"])
+        special_vocab._set_special_token("bos", tokenizer.get_added_vocab()["<|endoftext|>"])


#13099 (comment)

I'm not sure I can do that.

add_bos_token is already set to False by the HF model and conversion script.

The jinja templating system always removes the bos token, ignoring add_bos_token. So the prefix is removed and never added again.

If I remove the bos token altogether, the /props endpoint crashes because it expects it

This is the relevant jinja template code. reached when ./llama-server --jinja

llama.cpp/common/chat.cpp

Line 651 in e291450

if (string_starts_with(result, tmpl.bos_token())) {

~~if the BOS is not added, then what is the reason behind changing it from [gMASK] to <|endoftext|>? it shouldn't crash /props, right?~~

Nvm I see what you mean

Mushoz · 2025-04-27T20:02:53Z

Do GGUFs have to be regenerated for these fixes to apply, or can I continue using the GGUF I already have?

matteoserva · 2025-04-27T20:30:45Z

Do GGUFs have to be regenerated for these fixes to apply, or can I continue using the GGUF I already have?

The GGUF should be regenerated or edited manually.

* fix wrong template in GLM4-0414 * fix spaces * no bos token since it is already in the template * moved the chatgml4 check to higher priority * restored template for old GLM models * moved the GLM4 template check in the correct place with correct check

Dampfinchen · 2025-04-30T13:17:09Z

Hello! Sorry, noob here. What changes would I have to make to my existing GGUF? I've looked at the template and I'm just confused.

matteoserva · 2025-04-30T13:36:13Z

Hello! Sorry, noob here. What changes would I have to make to my existing GGUF? I've looked at the template and I'm just confused.

you only have to replace the bos token kv parameter from [gMask] to <|endoftext|> in the GGUF, then use a updated llama.cpp

Ph0rk0z · 2025-04-30T13:57:27Z

kv parameter? Do you mean edit the bos token ID in the metadata? or somewhere else. And I assume this means I was testing the model with bos token listed twice, even in text completion?

I wish you could simply choose to add the bos or not in text completions from the front end. I'm under the impression that it's added anyway.

matteoserva · 2025-04-30T15:17:16Z

kv parameter? Do you mean edit the bos token ID in the metadata? or somewhere else. And I assume this means I was testing the model with bos token listed twice, even in text completion?

I wish you could simply choose to add the bos or not in text completions from the front end. I'm under the impression that it's added anyway.

The command to enable/disable the bos token is --override-kv tokenizer.ggml.add_bos_token=bool:false.
In GLM the value add_bos_token is not set and it defaults to false.

The problem is that the jinja templating system ignores add_bos_token and breaks the prompt. The fix for a old GGUF is simply to change the bos token (tokenizer.ggml.bos_token_id) from [gMASK] to <|endoftext|> and run it with an updated llama.cpp

matteoserva added 4 commits April 27, 2025 20:16

fix wrong template in GLM4-0414

4e0c059

fix spaces

0cce580

no bos token since it is already in the template

cc93292

moved the chatgml4 check to higher priority

36f927f

github-actions bot added the python python script changes label Apr 27, 2025

matteoserva mentioned this pull request Apr 27, 2025

fix wrong template in GLM4-0414 #13099

Closed

restored template for old GLM models

f44e24b

ngxson reviewed Apr 27, 2025

View reviewed changes

src/llama-chat.cpp Outdated Show resolved Hide resolved

ngxson reviewed Apr 27, 2025

View reviewed changes

moved the GLM4 template check in the correct place with correct check

0128f14

ngxson approved these changes Apr 27, 2025

View reviewed changes

ngxson merged commit ced44be into ggml-org:master Apr 27, 2025
50 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix wrong template in GLM4-0414 #13140

fix wrong template in GLM4-0414 #13140

matteoserva commented Apr 27, 2025

ngxson Apr 27, 2025

matteoserva Apr 27, 2025

ngxson Apr 27, 2025 •

edited

Loading

Mushoz commented Apr 27, 2025

matteoserva commented Apr 27, 2025

Dampfinchen commented Apr 30, 2025

matteoserva commented Apr 30, 2025

Ph0rk0z commented Apr 30, 2025

matteoserva commented Apr 30, 2025

fix wrong template in GLM4-0414 #13140

fix wrong template in GLM4-0414 #13140

Conversation

matteoserva commented Apr 27, 2025

ngxson Apr 27, 2025

Choose a reason for hiding this comment

matteoserva Apr 27, 2025

Choose a reason for hiding this comment

ngxson Apr 27, 2025 • edited Loading

Choose a reason for hiding this comment

Mushoz commented Apr 27, 2025

matteoserva commented Apr 27, 2025

Dampfinchen commented Apr 30, 2025

matteoserva commented Apr 30, 2025

Ph0rk0z commented Apr 30, 2025

matteoserva commented Apr 30, 2025

ngxson Apr 27, 2025 •

edited

Loading