Training Textblob With Custom Datasets: En-Spelling
Training Textblob With Custom Datasets: En-Spelling
Let's try to make one for our Darwin example. We'll use all the words in
the "On the Origin of Species" to train. You can use any text, just make
sure it has enough words, that are relevant to the text you wish to correct.
In our case, the rest of the book will provide great context and additional
information that TextBlob would need to be more accurate in the correction.
import re
textToLower = ""
a 3389
abdomen 3
aberrant 9
aberration 5
abhorrent 1
abilities 1
ability 4
abjectly 1
able 54
ably 5
abnormal 17
abnormally 2
abodes 2
...
This indicates that the word "a" shows up as a word 3389 times,
while "ably" shows up only 5 times. To test out this trained model, we'll
use suggest(text) instead of correct(text) , which a list of word-confidence
tuples. The first elements in the list will be the word it's most confident
about, so we can access it via suggest(text)[0][0] .
pathToFile = "train.txt"
text = f.read()
words = text.split()
for i in words :
print (corrected)
As far as I am all to judge after long attending to the subject the conditions
certain parts alone and indirectly by acting the reproduce system It respect to
the direct action we most be in mid the in every case as Professor Weismann as
Domesticcation," there are two facts namely the nature of the organism and the
nature of the conditions The former seems to be much th are important for
dissimilar conditions and on the other hand dissimilar variations arise under
conditions which appear to be nearly uniform The effects on the offspring are
nearly all the offspring off individuals exposed to certain conditions during