Skip to content

FP16 and 4-bit quantized model both produce garbage output on M1 8GB #137

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Alden5 opened this issue Mar 14, 2023 · 4 comments
Closed

FP16 and 4-bit quantized model both produce garbage output on M1 8GB #137

Alden5 opened this issue Mar 14, 2023 · 4 comments
Labels
hardware Hardware related

Comments

@Alden5
Copy link

Alden5 commented Mar 14, 2023

Both the ggml-model-q4_0 and ggml-model-f16 produce a garbage output on my M1 Air 8GB, using the 7B LLaMA model. I've seen the quantized model having problems but I doubt the quantization is the issue as the non-quantized model produces the same output.

➜  llama.cpp git:(master) ./main -m ./models/7B/ggml-model-f16.bin -p "Building a website can be done in 10 simple steps:" -t 8 -n 512
main: seed = 1678812348
llama_model_load: loading model from './models/7B/ggml-model-f16.bin' - please wait ...
llama_model_load: n_vocab = 32000
llama_model_load: n_ctx   = 512
llama_model_load: n_embd  = 4096
llama_model_load: n_mult  = 256
llama_model_load: n_head  = 32
llama_model_load: n_layer = 32
llama_model_load: n_rot   = 128
llama_model_load: f16     = 1
llama_model_load: n_ff    = 11008
llama_model_load: n_parts = 1
llama_model_load: ggml ctx size = 13365.09 MB
llama_model_load: memory_size =   512.00 MB, n_mem = 16384
llama_model_load: loading model part 1/1 from './models/7B/ggml-model-f16.bin'
llama_model_load: ........... done
llama_model_load: model size =  4274.30 MB / num tensors = 90

system_info: n_threads = 8 / 8 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | VSX = 0 |

main: prompt: 'Building a website can be done in 10 simple steps:'
main: number of tokens in prompt = 15
     1 -> ''
  8893 -> 'Build'
   292 -> 'ing'
   263 -> ' a'
  4700 -> ' website'
   508 -> ' can'
   367 -> ' be'
  2309 -> ' done'
   297 -> ' in'
 29871 -> ' '
 29896 -> '1'
 29900 -> '0'
  2560 -> ' simple'
  6576 -> ' steps'
 29901 -> ':'

sampling parameters: temp = 0.800000, top_k = 40, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.300000


Building a website can be done in 10 simple steps:Administrationistrunkoveryabasepair tou cross deprecatedinition holes prvindor^C
@v3ss0n
Copy link

v3ss0n commented Mar 14, 2023

Thtats nothing to do with this project.

@Alden5
Copy link
Author

Alden5 commented Mar 14, 2023

@v3ss0n could you please elaborate?

@jarcen
Copy link

jarcen commented Mar 14, 2023

I started messing with this project two hours ago and had exactly same issue. Completely mangled output.
Turns out for me the problem was that I compiled it with Cygwin. After re-compiling clean with MinGW64 via w64devkit the problem disappeared.
The tip-off was that token list(shown as input prompt deconstructed into individual tokens) sometimes didn't match the prompt(occasionally it was truncated). In your case it looks fine but try to alter prompt by adding and removing words. If you see tokenized prompt not matching your prompt then perhaps you have same problem with compiler... Or something... Honestly, I don't know what could go so wrong that it compiles without errors into broken binary. The wild west of C and pluses.

@Alden5
Copy link
Author

Alden5 commented Mar 14, 2023

I found the solution to my issue! make sure that when you're using the convert-pth-to-ggml.py script that it completes and tells you Done. Output file: I was getting the error OSError: 45088768 requested and 31184896 written but didn't pay attention because everything else looked to be working. the f16 model will compile to 13GB! so the fact that i only had 10GB of storage was causing the problem.

@Alden5 Alden5 closed this as completed Mar 14, 2023
@gjmulder gjmulder added the hardware Hardware related label Mar 15, 2023
rooprob pushed a commit to rooprob/llama.cpp that referenced this issue Aug 2, 2023
Improve readme: clarify dependencies and other things to install
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hardware Hardware related
Projects
None yet
Development

No branches or pull requests

4 participants