"Missing importance matrix" despite imatrix being provided

I have generated an imatrix.dat for the Q8_0 quant here:

https://huggingface.co/mradermacher/BiscuitRP-8x7B-GGUF

The imatrix generation was uneventful and succeeded in generating this:

https://huggingface.co/mradermacher/BiscuitRP-8x7B-i1-GGUF/blob/main/imatrix.dat

But using it fails with:

`Missing importance matrix for tensor blk.0.ffn_gate.0.weight in a very low-bit quantization`

Full log:

load_imatrix: loaded 256 importance matrix entries from BiscuitRP-8x7B-i1-GGUF/imatrix.dat
prepare_imatrix: have 256 importance matrix entries
main: build = 2627 (855f5440)
main: built with gcc (Debian 12.2.0-14) 12.2.0 for x86_64-linux-gnu
main: quantizing './BiscuitRP-8x7B.gguf' to './BiscuitRP-8x7B-i1-GGUF/BiscuitRP-8x7B.i1-IQ2_M.gguf.backup1~' as IQ2_M
llama_model_loader: loaded meta data with 23 key-value pairs and 995 tensors from ./BiscuitRP-8x7B.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.name str              = .
llama_model_loader: - kv   2:                       llama.context_length u32              = 32768
llama_model_loader: - kv   3:                     llama.embedding_length u32              = 4096
llama_model_loader: - kv   4:                          llama.block_count u32              = 32
llama_model_loader: - kv   5:                  llama.feed_forward_length u32              = 14336
llama_model_loader: - kv   6:                 llama.rope.dimension_count u32              = 128
llama_model_loader: - kv   7:                 llama.attention.head_count u32              = 32
llama_model_loader: - kv   8:              llama.attention.head_count_kv u32              = 8
llama_model_loader: - kv   9:                         llama.expert_count u32              = 8
llama_model_loader: - kv  10:                    llama.expert_used_count u32              = 2
llama_model_loader: - kv  11:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010
llama_model_loader: - kv  12:                       llama.rope.freq_base f32              = 1000000.000000
llama_model_loader: - kv  13:                          general.file_type u32              = 1
llama_model_loader: - kv  14:                       tokenizer.ggml.model str              = llama
llama_model_loader: - kv  15:                      tokenizer.ggml.tokens arr[str,32000]   = ["<unk>", "<s>", "</s>", "<0x00>", "<...
llama_model_loader: - kv  16:                      tokenizer.ggml.scores arr[f32,32000]   = [0.000000, 0.000000, 0.000000, 0.0000...
llama_model_loader: - kv  17:                  tokenizer.ggml.token_type arr[i32,32000]   = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...
llama_model_loader: - kv  18:                tokenizer.ggml.bos_token_id u32              = 1
llama_model_loader: - kv  19:                tokenizer.ggml.eos_token_id u32              = 2
llama_model_loader: - kv  20:            tokenizer.ggml.unknown_token_id u32              = 0
llama_model_loader: - kv  21:               tokenizer.ggml.add_bos_token bool             = true
llama_model_loader: - kv  22:               tokenizer.ggml.add_eos_token bool             = false
llama_model_loader: - type  f32:   65 tensors
llama_model_loader: - type  f16:  930 tensors
================================ Have weights data with 256 entries
llama_model_quantize_internal: meta size = 779648 bytes
[   1/ 995]                    token_embd.weight - [ 4096, 32000,     1,     1], type =    f16, 
====== llama_model_quantize_internal: did not find weights for token_embd.weight
converting to iq3_s .. size =   250.00 MiB ->    53.71 MiB
[   2/ 995]              blk.0.ffn_gate.0.weight - [ 4096, 14336,     1,     1], type =    f16, 
====== llama_model_quantize_internal: did not find weights for blk.0.ffn_gate.0.weight


============================================================
Missing importance matrix for tensor blk.0.ffn_gate.0.weight in a very low-bit quantization
The result will be garbage, so bailing out
============================================================

llama_model_quantize: failed to quantize: Missing importance matrix for tensor blk.0.ffn_gate.0.weight in a very low-bit quantization
main: failed to quantize model from './BiscuitRP-8x7B.gguf'


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

"Missing importance matrix" despite imatrix being provided #6552

============================================================
Missing importance matrix for tensor blk.0.ffn_gate.0.weight in a very low-bit quantization
The result will be garbage, so bailing out

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

"Missing importance matrix" despite imatrix being provided #6552

Description

============================================================ Missing importance matrix for tensor blk.0.ffn_gate.0.weight in a very low-bit quantization The result will be garbage, so bailing out

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

============================================================
Missing importance matrix for tensor blk.0.ffn_gate.0.weight in a very low-bit quantization
The result will be garbage, so bailing out