Description
I am trying to convert deepseek-ai/deepseek-coder-1.3b-base using llama.cpp/convert.py
with
Command
python llama.cpp/convert.py codes-hf
--outfile codes-1b.gguf
--outtype q8_0
Output:
Loading model file codes-hf/pytorch_model.bin
params = Params(n_vocab=32256, n_embd=2048, n_layer=24, n_ctx=16384, n_ff=5504, n_head=16, n_head_kv=16, n_experts=None, n_experts_used=None, f_norm_eps=1e-06, rope_scaling_type=<RopeScalingType.LINEAR: 'linear'>, f_rope_freq_base=100000, f_rope_scale=4.0, n_orig_ctx=None, rope_finetuned=None, ftype=<GGMLFileType.MostlyQ8_0: 7>, path_model=PosixPath('codes-hf'))
Traceback (most recent call last):
File "/home/woodx/Workspace/llamacpp/llama.cpp/convert.py", line 1548, in
main()
File "/home/woodx/Workspace/llamacpp/llama.cpp/convert.py", line 1515, in main
vocab, special_vocab = vocab_factory.load_vocab(vocab_types, model_parent_path)
File "/home/woodx/Workspace/llamacpp/llama.cpp/convert.py", line 1417, in load_vocab
vocab = self._create_vocab_by_path(vocab_types)
File "/home/woodx/Workspace/llamacpp/llama.cpp/convert.py", line 1407, in _create_vocab_by_path
raise FileNotFoundError(f"Could not find a tokenizer matching any of {vocab_types}")
FileNotFoundError: Could not find a tokenizer matching any of ['spm', 'hfft']
the "tokenizer_class": "LlamaTokenizerFast", is there a way to support it?