Apriori Algorithm
Apriori Algorithm
The technique of frequent pattern mining is built upon a number of fundamental ideas. The
analysis is based on transaction databases, which include records or transactions that represent
collections of objects. Items inside these transactions are grouped together as itemsets.
The importance of patterns is greatly influenced by support and confidence measurements.
1. Apriori Algorithm
Apriori Algorithm is a foundational method in data mining used for discovering frequent
itemsets and generating association rules. Its significance lies in its ability to identify
relationships between items in large datasets which is particularly valuable in market basket
analysis.
For example, if a grocery store finds that customers who buy bread often also buy butter, it
can use this information to optimise product placement or marketing strategies.
Lets understand the concept of apriori Algorithm with the help of an example. Consider the
following dataset and we will find frequent itemsets and generate association rules for them:
Frequent 1-Itemsets
All items have support% ≥ 50%, so they qualify as frequent 1-itemsets. if any item has
support% < 50%, It will be omitted out from the frequent 1- itemsets.
Step 3: Generate Candidate 2-Itemsets
Combine the frequent 1-itemsets into pairs and calculate their support.
For this use case, we will get 3 item pairs ( bread,butter) , (bread,ilk) and (butter,milk) and
will calculate the support similar to step 2
Candidate 2-Itemsets
Frequent 2-itemsets:
{Bread, Milk} meet the 50% threshold but {Butter, Milk} and {Bread ,Butter} doesn’t meet
the threshold, so will be committed out.
Step 4: Generate Candidate 3-Itemsets
Combine the frequent 2-itemsets into groups of 3 and calculate their support.
for the triplet, we have only got one case i.e {bread,butter,milk} and we will calculate the
support.
Candidate 3-Itemsets
Since this does not meet the 50% threshold, there are no frequent 3-itemsets.
Step 5: Generate Association Rules
Now we generate rules from the frequent itemsets and calculate confidence.
Rule 1: If Bread → Butter (if customer buys bread, the customer will buy butter also)
Support of {Bread, Butter} = 2.
Support of {Bread} = 4.
Confidence = 2/4 = 50% (Failed threshold).
Rule 2: If Butter → Bread (if customer buys butter, the customer will buy bread also)
Support of {Bread, Butter} = 3.
Support of {Butter} = 3.
Confidence = 3/3 = 100% (Passes threshold).
Rule 3: If Bread → Milk (if customer buys bread, the customer will buy milk also)
Support of {Bread, Milk} = 3.
Support of {Bread} = 4.
Confidence = 3/4 = 75% (Passes threshold).
The Apriori Algorithm, as demonstrated in the bread-butter example, is widely used in
modern startups like Zomato, Swiggy, and other food delivery platforms. These companies
use it to perform market basket analysis, which helps them identify customer behaviour
patterns and optimise recommendations.
Applications of Apriori Algorithm
Below are some applications of Apriori algorithm used in today’s companies and startups
1. E-commerce: Used to recommend products that are often bought together, like laptop +
laptop bag, increasing sales.
2. Food Delivery Services: Identifies popular combos, such as burger + fries, to
offer combo deals to customers.
3. Streaming Services: Recommends related movies or shows based on what users often
watch together, like action + superhero movies.
4. Financial Services: Analyzes spending habits to suggest personalised offers, such
as credit card deals based on frequent purchases.
5. Travel & Hospitality: Creates travel packages (e.g., flight + hotel) by finding commonly
purchased services together.
6. Health & Fitness: Suggests workout plans or supplements based on users’ past
activities, like protein shakes + workouts.
Bioinformatics
The identification of relevant DNA patterns in the field of bioinformatics is made possible by
often occurring pattern mining. Researchers can get insights into genetic variants, illness
connections, and drug development by examining big genomic databases for recurrent
patterns. In order to diagnose diseases, practice personalized medicine, and create innovative
therapeutic strategies, frequent pattern mining algorithms help uncover important DNA
sequences and patterns.
Conclusion
In conclusion, frequent pattern mining is a fundamental method for data mining that focuses on
identifying recurrent patterns in sizable datasets. This method finds hidden dependencies and
relationships by recognizing groups of elements that regularly co?occur. The value of frequent
pattern mining is found in its capacity to offer insightful data for data?driven decision?making.