-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Closed
Description
root@7fbdb34e97e0:/opt/sentencepiece# spm_train --input=data/his.txt --model_prefix=mx --vocab_size=4000
opt/sentencepiece/src/trainer_interface.cc(225) LOG(INFO) Loaded 87599 sentences
/opt/sentencepiece/src/trainer_interface.cc(226) LOG(INFO) Loaded 0 test sentences
/opt/sentencepiece/src/trainer_interface.cc(250) LOG(INFO) all chars count=1455858
/opt/sentencepiece/src/trainer_interface.cc(258) LOG(INFO) Done: 99.9501% characters are covered.
/opt/sentencepiece/src/trainer_interface.cc(267) LOG(INFO) Alphabet size=482
/opt/sentencepiece/src/trainer_interface.cc(268) LOG(INFO) Final character coverage=0.999501
/opt/sentencepiece/src/trainer_interface.cc(300) LOG(INFO) Done! 87599 sentences are loaded
/opt/sentencepiece/src/unigram_model_trainer.cc(127) LOG(INFO) Using 87599 sentences for making seed sentencepieces
/opt/sentencepiece/src/unigram_model_trainer.cc(155) LOG(INFO) Making suffix array...
/opt/sentencepiece/src/unigram_model_trainer.cc(159) LOG(INFO) Extracting frequent sub strings...
/opt/sentencepiece/src/unigram_model_trainer.cc(210) LOG(INFO) Initialized 5129 seed sentencepieces
/opt/sentencepiece/src/trainer_interface.cc(306) LOG(INFO) Tokenizing input sentences with whitespace: 87599
/opt/sentencepiece/src/trainer_interface.cc(315) LOG(INFO) Done! 12256
/opt/sentencepiece/src/unigram_model_trainer.cc(502) LOG(INFO) Using 12256 sentences for EM training
/opt/sentencepiece/src/unigram_model_trainer.cc(518) LOG(INFO) EM sub_iter=0 size=2619 obj=27.6329 num_tokens=69501 num_tokens/piece=26.5372
/opt/sentencepiece/src/unigram_model_trainer.cc(518) LOG(INFO) EM sub_iter=1 size=2035 obj=19.1728 num_tokens=70687 num_tokens/piece=34.7356
/opt/sentencepiece/src/trainer_interface.cc(371) LOG(INFO) Saving model: mx.model
/opt/sentencepiece/src/spm_train_main.cc(159) [_status.ok()] Internal: /opt/sentencepiece/src/trainer_interface.cc(362) [(trainer_spec_.vocab_size()) == (model_proto->pieces_size())]
Program terminated with an unrecoverable error.
Metadata
Metadata
Assignees
Labels
No labels