Add conversion from FP32 quants to FP16 quants model #1562

Jason0214 · 2023-05-22T17:39:00Z

Move file_version checking to the front of tensor data loading, so that any broken tensor data won't be parsed, and file_version checking can do its job.
Add fp32 to fp16 conversion for Q4_0 and Q4_1 tensors.

At least the file version check should be fixed. Give a try on the conversion by the way.

- Move file_version checking to the front of tensor data loading, so that any broken tensor data won't be parsed, and file_version checking can do its job. - Add fp32 to fp16 conversion for Q4_0 and Q4_1 tensors.

Add conversion from FP32 quants to FP16 quants model

ee9aaaa

- Move file_version checking to the front of tensor data loading, so that any broken tensor data won't be parsed, and file_version checking can do its job. - Add fp32 to fp16 conversion for Q4_0 and Q4_1 tensors.

Jason0214 closed this May 22, 2023

Bearsaerker mentioned this pull request Mar 12, 2025

Eval bug: Gemma 3 extremly slow prompt processing when using quantized kv cache. #12352

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add conversion from FP32 quants to FP16 quants model #1562

Add conversion from FP32 quants to FP16 quants model #1562

Jason0214 commented May 22, 2023 •

edited

Loading

Add conversion from FP32 quants to FP16 quants model #1562

Add conversion from FP32 quants to FP16 quants model #1562

Conversation

Jason0214 commented May 22, 2023 • edited Loading

Jason0214 commented May 22, 2023 •

edited

Loading