can llama.cpp/convert.py support tokenizer rather than 'spm', 'bpe', 'hfft'

I am trying to convert deepseek-ai/deepseek-coder-1.3b-base using llama.cpp/convert.py 
with 

### Command 
 python llama.cpp/convert.py codes-hf \
  --outfile codes-1b.gguf \
  --outtype q8_0

### Output:
Loading model file codes-hf/pytorch_model.bin
params = Params(n_vocab=32256, n_embd=2048, n_layer=24, n_ctx=16384, n_ff=5504, n_head=16, n_head_kv=16, n_experts=None, n_experts_used=None, f_norm_eps=1e-06, rope_scaling_type=<RopeScalingType.LINEAR: 'linear'>, f_rope_freq_base=100000, f_rope_scale=4.0, n_orig_ctx=None, rope_finetuned=None, ftype=<GGMLFileType.MostlyQ8_0: 7>, path_model=PosixPath('codes-hf'))
Traceback (most recent call last):
  File "/home/woodx/Workspace/llamacpp/llama.cpp/convert.py", line 1548, in <module>
    main()
  File "/home/woodx/Workspace/llamacpp/llama.cpp/convert.py", line 1515, in main
    vocab, special_vocab = vocab_factory.load_vocab(vocab_types, model_parent_path)
  File "/home/woodx/Workspace/llamacpp/llama.cpp/convert.py", line 1417, in load_vocab
    vocab = self._create_vocab_by_path(vocab_types)
  File "/home/woodx/Workspace/llamacpp/llama.cpp/convert.py", line 1407, in _create_vocab_by_path
    raise FileNotFoundError(f"Could not find a tokenizer matching any of {vocab_types}")
FileNotFoundError: Could not find a tokenizer matching any of ['spm', 'hfft']

the "tokenizer_class": "LlamaTokenizerFast", is there a way to support it? 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

can llama.cpp/convert.py support tokenizer rather than 'spm', 'bpe', 'hfft' #6690

Command

Output:

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

can llama.cpp/convert.py support tokenizer rather than 'spm', 'bpe', 'hfft' #6690

Description

Command

Output:

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions