FastText Working and Implementation
Last Updated :
24 May, 2024
What is FastText?
FastText is an open-source, free library from Facebook AI Research(FAIR) for learning word embeddings and word classifications. This model allows creating unsupervised learning or supervised learning algorithm for obtaining vector representations for words. It also evaluates these models. FastText supports both CBOW and Skip-gram models.
Uses of FastText:
- It is used for finding semantic similarities
- It can also be used for text classification(ex: spam filtering).
- It can train large datasets in minutes.
Working of FastText:
FastText is very fast in training word vector models. You can train about 1 billion words in less than 10 minutes. The models built through deep neural networks can be slow to train and test. These methods use a linear classifier to train the model.
Linear classifier: In this text and labels are represented as vectors. We find vector representations such that text and it’s associated labels have similar vectors. In simple words, the vector corresponding to the text is closer to its corresponding label.
To find the probability score of a correct label given it’s associated text we use the softmax function:
- Here travel is the label and car is the text associated to it.
To maximize this probability of the correct label we can use the Gradient Descent algorithm.
This is quite computationally expensive because for every piece of text not only we have to get the score associated with its correct label but we need to get the score for every other label in the training set. This limits the use of these models on very large datasets.
FastText solves this problem by using a hierarchical classifier to train the model.
Hierarchical Classifier used by FastText:
In this method, it represents the labels in a binary tree. Every node in the binary tree represents a probability. A label is represented by the probability along the path to that given label. This means that the leaf nodes of the binary tree represent the labels.
FastText uses the Huffman algorithm to build these trees to make full use of the fact that classes can be imbalanced. Depth of the frequently occurring labels is smaller than the infrequent ones.
Using a binary tree speed up the time of search as instead of having to go through all the different elements you just search for the nodes. So now we won’t have to compute the score for every single possible label, and we will only be calculating just the probability on each node in the path to the one correct label. Hence this method vastly reduces the time complexity of training the model.
Increasing the speed does not sacrifice the accuracy of the model.
- When we have unlabeled dataset FastText uses the N-Gram Technique to train the model. Let us understand more in detail how this technique works-
Let us consider a word from our dataset, for example: “kingdom”. Now it will take a look at the word “kingdom” and will break it into its n-gram components as-
kingdom = ['k','in','kin','king','kingd','kingdo','kingdom',...]
These are some n-gram components for the given words. There will be many more components for this word but only a few are stated here just to get an idea. The size of the n-gram components can be chosen as per your choice. The length of n-grams can be between the minimum and the maximum number of characters selected. You can do so by using the -minn and -maxn flags respectively.
Note: When your text is not words from a particular language then using n-grams won’t make sense. for example: when the corpus contains ids it will not be storing words but numbers and special characters. In this case, you can turn off the -gram embeddings by selecting the -minn and -maxn parameters as 0.
When the model updates, fastText learns the weights for every n-gram along with the entire word token.
In this manner, each token/word will be expressed as the sum and an average of its n-gram components.
- Word vectors generated through fastText hold extra information about their sub-words. As in the above example, we can see that one of the components for the word “kingdom” is the word “king”. This information helps the model build semantic similarity between the two words.
- It also allows for capturing the meaning of suffixes/prefixes for the given words in the corpus.
- It allows for generating better word embeddings for different or rare words as well.
- It can also generate word embeddings for out of vocabulary(OOV) words.
- While using fastText even if you don’t remove the stopwords still the accuracy is not compromised. You can perform simple pre-processing steps on your corpus if you fell like.
- As fastText has the feature of providing sub-word information, it can also be used on morphologically rich languages like Spanish, French, German, etc.
We do get better word embeddings through fastText but it uses more memory as compared to word2vec or GloVe as it generates a lot of sub-words for each word.
Implementation of FastText
Firstly we will have to build fastText. For doing so follow the steps given below –
In your terminal run the below commands-
$ wget https://github.com/facebookresearch/fastText/archive/v0.9.2.zip
$ unzip v0.9.2.zip
$ cd fastText-0.9.2
$ make
Note: If your make command gives an error like – ‘make’ is not recognized as an internal or external command, operable program or batch file. You can download the MinGW by clicking on the link.
After this you need to add the path of its bin folder to system variables then you can use it instead of the make command as-
$ mingw32-make
We have successfully built fastText.
The commands supported by fastText are –
supervised train a supervised classifier
quantize quantize a model to reduce the memory usage
test evaluate a supervised classifier
test-label print labels with precision and recall scores
predict predict most likely labels
predict-prob predict most likely labels with probabilities
skipgram train a skipgram model
cbow train a cbow model
print-word-vectors print word vectors given a trained model
print-sentence-vectors print sentence vectors given a trained model
print-ngrams print ngrams given a trained model and word
nn query for nearest neighbors
analogies query for analogies
dump dump arguments,dictionary,input/output vectors
Now I have taken the amazon reviews dataset and saved it as amazon_reviews.txt. You can also perform some pre-processing on your data to get better results.
We will be training a skipgram model. After you are in the fastText-0.9.2 directory, run the below-mentioned command-
$ fasttext skipgram -input amazon_reviews.txt -output model_trained
Here the input file is amazon_reviews.txt. Make sure to give the full path to your file if it is not in the dame directory. model_trained is the name given for the output file.
You can also add other parameters to it explicitly as per your requirement like epos etc. Here we have used the defaults.
It first starts reading the words present in the input document. The document consisted of 32M words and had an ETA of around 15 mins.
It gives detailed statistics of the learning rate of the neural network, how many words are being processed every second on every thread. It also shows the loss value which goes on decreasing as the model is being trained.
After the model is trained we get two files generated i.e. model_trained.bin and model_trained.vec. The .bin file contains the parameters of the model along with the dictionary. This is the file which fasttext uses. The .vec file is a text file which contains the word vectors. This is the file which you will be using in your applications.
We are now going to use our word vectors and perform some operations on it-
1) Finding Nearest Neighbors for a given word
To initialize the nearest neighbor interface execute the following command:
$ fasttext nn model_trained.bin
The interface asks for a query word to which you want to find the nearest neighbors. The output for the query word “brutality” is-

2) Performing Word Analogies
To perform word analogies of the form ( A – B + C ) on words you can execute the below-mentioned command:
$ fasttext analogies model_trained.bin
The word analogies for A = king, B = man, C = woman are:
The first output for the query is “queen” which is the most correct answer possible for this query. Hence, our model trained is quite accurate.
You can also perform other operations like testing your model with a file of test data, making predictions of the correct labels, getting the n-grams for the given words,etc. You can do these by using the above-mentioned commands available in fasttext.
Similar Reads
Fast Exponentiation in Python
We are going to learn about how it can be optimized or make fast computations or how to use exponents in Python to the power of numbers as compared to the traditional method using Python. What is Exponentiation? It is a mathematical operation in which we compute the expression ab by repeating the m
6 min read
Real-Time Data Processing: Challenges and Solutions for Streaming Data
In todayâs fast-paced digital landscape, real-time data processing is essential for businesses to maintain a competitive edge. From financial transactions to social media feeds, analysing and acting on data as it streams in is crucial for making timely and informed decisions. However, processing str
6 min read
Handle streaming data in a data engineering pipeline
Streaming data, continuously generated from sources like social media and IoT devices, demands real-time processing. This type of data is characterized by its velocity and volume, making traditional batch processing methods inadequate. As businesses increasingly rely on instant data insights, the im
6 min read
Optimizing Performance in Numba: Advanced Techniques for Parallelization
Parallel computing is a powerful technique to enhance the performance of computationally intensive tasks. In Python, Numba is a Just-In-Time (JIT) compiler that translates a subset of Python and NumPy code into fast machine code. One of its features is the ability to parallelize loops, which can sig
6 min read
What is the difference between batch processing and real-time processing?
In this article, we will learn about two fundamental methods that govern the flow of information and understand how data gets processed in the digital world. We start with simple definitions of batch processing and real-time processing, and gradually cover the unique characteristics and differences.
4 min read
What is Real Time Processing in Data Ingestion?
The ability to handle data as it is generated has become increasingly important. Real-time data handling stands out as a strong method that allows instant decision-making, business efficiency, and improved user experiences. In this article, we looks into the idea, uses, methods, design, benefits, ob
6 min read
A* algorithm and its Heuristic Search Strategy in Artificial Intelligence
The A* (A-star) algorithm is a powerful and versatile search method used in computer science to find the most efficient path between nodes in a graph. Widely used in a variety of applications ranging from pathfinding in video games to network routing and AI, A* remains a foundational technique in th
8 min read
Preparata Algorithm
Preparata's algorithm is a recursive Divide and Conquer Algorithm where the rank of each input key is computed and the keys are outputted according to their ranks. C/C++ Code for 1 <= i, j <= n in parallel; for r : = 1 to logn do { Step 1. In parallel set q[i, j, k] := m[i, j] + m[j, k] for 1
14 min read
Time taken by Loop unrolling vs Normal loop
We have discussed loop unrolling. The idea is to increase performance by grouping loop statements so that there are less number of loop control instruction and loop test instructions C/C++ Code // CPP program to compare normal loops and // loops with unrolling technique #include <iostream> #in
6 min read
Program for Shortest Job First (or SJF) CPU Scheduling | Set 1 (Non- preemptive)
The shortest job first (SJF) or shortest job next, is a scheduling policy that selects the waiting process with the smallest execution time to execute next. SJN, also known as Shortest Job Next (SJN), can be preemptive or non-preemptive. Characteristics of SJF Scheduling: Shortest Job first has the
13 min read