Skip to content

"Missing importance matrix" despite imatrix being provided #6552

Closed as not planned
@schmorp

Description

@schmorp

I have generated an imatrix.dat for the Q8_0 quant here:

https://huggingface.co/mradermacher/BiscuitRP-8x7B-GGUF

The imatrix generation was uneventful and succeeded in generating this:

https://huggingface.co/mradermacher/BiscuitRP-8x7B-i1-GGUF/blob/main/imatrix.dat

But using it fails with:

Missing importance matrix for tensor blk.0.ffn_gate.0.weight in a very low-bit quantization

Full log:

load_imatrix: loaded 256 importance matrix entries from BiscuitRP-8x7B-i1-GGUF/imatrix.dat
prepare_imatrix: have 256 importance matrix entries
main: build = 2627 (855f544)
main: built with gcc (Debian 12.2.0-14) 12.2.0 for x86_64-linux-gnu
main: quantizing './BiscuitRP-8x7B.gguf' to './BiscuitRP-8x7B-i1-GGUF/BiscuitRP-8x7B.i1-IQ2_M.gguf.backup1~' as IQ2_M
llama_model_loader: loaded meta data with 23 key-value pairs and 995 tensors from ./BiscuitRP-8x7B.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = llama
llama_model_loader: - kv 1: general.name str = .
llama_model_loader: - kv 2: llama.context_length u32 = 32768
llama_model_loader: - kv 3: llama.embedding_length u32 = 4096
llama_model_loader: - kv 4: llama.block_count u32 = 32
llama_model_loader: - kv 5: llama.feed_forward_length u32 = 14336
llama_model_loader: - kv 6: llama.rope.dimension_count u32 = 128
llama_model_loader: - kv 7: llama.attention.head_count u32 = 32
llama_model_loader: - kv 8: llama.attention.head_count_kv u32 = 8
llama_model_loader: - kv 9: llama.expert_count u32 = 8
llama_model_loader: - kv 10: llama.expert_used_count u32 = 2
llama_model_loader: - kv 11: llama.attention.layer_norm_rms_epsilon f32 = 0.000010
llama_model_loader: - kv 12: llama.rope.freq_base f32 = 1000000.000000
llama_model_loader: - kv 13: general.file_type u32 = 1
llama_model_loader: - kv 14: tokenizer.ggml.model str = llama
llama_model_loader: - kv 15: tokenizer.ggml.tokens arr[str,32000] = ["", "", "", "<0x00>", "<...
llama_model_loader: - kv 16: tokenizer.ggml.scores arr[f32,32000] = [0.000000, 0.000000, 0.000000, 0.0000...
llama_model_loader: - kv 17: tokenizer.ggml.token_type arr[i32,32000] = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...
llama_model_loader: - kv 18: tokenizer.ggml.bos_token_id u32 = 1
llama_model_loader: - kv 19: tokenizer.ggml.eos_token_id u32 = 2
llama_model_loader: - kv 20: tokenizer.ggml.unknown_token_id u32 = 0
llama_model_loader: - kv 21: tokenizer.ggml.add_bos_token bool = true
llama_model_loader: - kv 22: tokenizer.ggml.add_eos_token bool = false
llama_model_loader: - type f32: 65 tensors
llama_model_loader: - type f16: 930 tensors
================================ Have weights data with 256 entries
llama_model_quantize_internal: meta size = 779648 bytes
[ 1/ 995] token_embd.weight - [ 4096, 32000, 1, 1], type = f16,
====== llama_model_quantize_internal: did not find weights for token_embd.weight
converting to iq3_s .. size = 250.00 MiB -> 53.71 MiB
[ 2/ 995] blk.0.ffn_gate.0.weight - [ 4096, 14336, 1, 1], type = f16,
====== llama_model_quantize_internal: did not find weights for blk.0.ffn_gate.0.weight

============================================================
Missing importance matrix for tensor blk.0.ffn_gate.0.weight in a very low-bit quantization
The result will be garbage, so bailing out

llama_model_quantize: failed to quantize: Missing importance matrix for tensor blk.0.ffn_gate.0.weight in a very low-bit quantization
main: failed to quantize model from './BiscuitRP-8x7B.gguf'

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions