-
Notifications
You must be signed in to change notification settings - Fork 13.9k
Load Sliding Window Attention (SWA) pattern from GGUF metadata (qwen3) #17597
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from 2 commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -555,6 +555,14 @@ void llama_model::load_hparams(llama_model_loader & ml) { | |
|
|
||
| ml.get_key_or_arr(LLM_KV_ATTENTION_HEAD_COUNT_KV, hparams.n_head_kv_arr, hparams.n_layer, false); | ||
|
|
||
| ml.get_key(LLM_KV_ATTENTION_SLIDING_WINDOW, hparams.n_swa, false); | ||
| if (hparams.n_swa > 0) { | ||
| hparams.swa_type = LLAMA_SWA_TYPE_STANDARD; | ||
| ml.get_key_or_arr(LLM_KV_ATTENTION_SLIDING_WINDOW_PATTERN, hparams.swa_layers, hparams.n_layer, false); | ||
| } else { | ||
| hparams.swa_type = LLAMA_SWA_TYPE_NONE; | ||
| } | ||
|
||
|
|
||
| bool rope_finetuned = false; | ||
| ml.get_key(LLM_KV_ROPE_SCALING_FINETUNED, rope_finetuned, false); | ||
| hparams.rope_finetuned = rope_finetuned; | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we can assume anything about
swa_typetype here. Please remove this lineThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line is required; otherwise swa won't activate. This similar structure is used below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't get it. how does swa not get activate?
swa_typeshould be set per-model. Please remove this line.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
swa requires specific masking to work. Without it, it will generate gibberish. The default semantic "standard". I think specific models can set to chunked or symmetric.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
context: https://github.com/search?q=repo%3Ahuggingface%2Ftransformers+sliding_attention&type=code&p=4