Extra Feature NLP
Extra Feature NLP
[4]: # It creates a vocabulary of unique words from the corpus and represents each␣
↪document as a vector of word frequencies.
1
# Example text data
documents = ["This is the first document.",
"This document is the second document.",
"And this is the third one.",
"Is this the first document?"]
2
ngram_vectors = ngram_vectorizer.fit_transform(data['Text'])
[7]: and this and this is document is document is the first document \
0 0 0 0 0 1
1 0 0 1 1 0
2 1 1 0 0 0
3 0 0 0 0 1
the second document the third the third one third one this document \
0 0 0 0 0 0
1 1 0 0 0 1
2 0 1 1 1 0
3 0 0 0 0 0
this document is this is this is the this the this the first
0 0 1 1 0 0
1 1 0 0 0 0
2 0 1 1 0 0
3 0 0 0 1 1
[4 rows x 25 columns]
[10]: #IDF (Inverse Document Frequency) is used to determine the importance of a term␣
↪in a document.
3
# Example text data
documents = ["This is the first document.",
"This document is the second document.",
"And this is the third one.",
"Is this the first document?"]
third this
0 0.000000 0.384085
1 0.000000 0.281089
2 0.511849 0.267104
3 0.000000 0.384085
[ ]: # word Embedding
[12]: # FastText
[15]: #It learns word embeddings using the Skip-gram or Continuous Bag-of-Words␣
↪(CBOW) architecture,
4
# Training data
sentences = [["I", "like", "apples"],
["I", "enjoy", "eating", "fruits"]]
[16]: 0 1 2 3 4 5 6 \
I -0.003053 0.001144 -0.001130 0.004910 -0.003084 -0.007648 0.007188
fruits -0.001457 0.001947 0.001137 -0.001536 -0.001588 -0.001997 -0.002027
eating 0.000412 0.001230 -0.002208 0.000289 0.001082 0.000401 0.001171
enjoy -0.001593 0.000200 0.000983 -0.001493 -0.000503 0.001380 0.001440
apples -0.000257 -0.000776 -0.000108 -0.001688 0.002155 -0.001124 0.002533
like 0.001024 -0.003016 0.001939 -0.001192 -0.003485 -0.001892 0.001637
7 8 9 … 90 91 92 \
I 0.007860 -0.001688 -0.002615 … 0.005416 0.001654 0.002986
fruits 0.002295 0.002176 -0.001157 … 0.000342 0.000272 -0.001761
eating -0.000369 -0.000706 0.002063 … -0.002273 0.001385 0.001710
enjoy -0.002292 -0.000112 -0.001617 … -0.003175 -0.001866 0.000952
apples 0.000522 0.000874 -0.000778 … 0.001021 0.000565 -0.001394
like -0.000633 -0.001284 0.001069 … -0.000179 0.002047 -0.000875
93 94 95 96 97 98 99
I 0.002967 0.007579 -0.002151 -0.003800 0.001423 0.001112 -0.000259
fruits -0.001308 -0.000937 -0.000236 -0.000219 -0.000568 -0.003610 -0.001075
eating -0.000360 -0.000841 0.002985 0.000116 -0.000775 -0.000186 0.001993
enjoy -0.002678 0.002496 -0.000418 -0.002535 -0.002113 -0.001011 0.000997
apples -0.000912 0.001105 -0.000151 0.001271 0.001879 0.001152 -0.000260
like -0.000740 0.002278 0.000509 0.001111 -0.001301 0.000404 0.001636