0% found this document useful (0 votes)

48 views34 pages

Data Mining: Frequent Pattern Analysis

Uploaded by

akramshaik2004

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

48 views34 pages

Data Mining: Frequent Pattern Analysis

Uploaded by

akramshaik2004

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

UNIT 3 DATA MINING -FREQUENT PATTERN ANALYSIS

Mining frequent patterns are the process of discovering sets of items that frequently occur together in a given
dataset. It involves finding all itemsets that meet a minimum support threshold and generating association rules
from the frequent itemsets. Apriori, FP-growth, and Eclat are popular algorithms for mining frequent patterns.
This technique has various applications, including market basket analysis, recommending products, and
network traffic analysis. It is a powerful method to identify relationships and insights in large datasets.

Introduction

Mining frequent patterns is a data mining technique that involves discovering sets of items frequently occurring
together in a given dataset. A frequent pattern is a set of items occurring frequently in a given dataset. In other
words, it is a subset of items that appears in a minimum number of transactions or records in the dataset. The
frequency of a pattern is typically measured by its support, which is the percentage of transactions in the
dataset that contain the pattern. A pattern is considered frequent if its support exceeds a predefined threshold.

Mining frequent patterns are important because it can help identify relationships and correlations among various
items in a large dataset, which can be useful in various applications such as market basket analysis,
recommendation systems, and network traffic analysis. For example, in market basket analysis, frequent
itemsets can be used to identify which products are commonly purchased together. This can be used to
optimize store layout or recommend related products to customers.

To illustrate, consider a dataset of customer purchases in a retail store. A frequent pattern may be that
customers who purchase bread and milk are also likely to purchase eggs. By identifying such patterns, a store
can optimize its product placement and promotions to increase sales and customer satisfaction.

Algorithms Used For Frequent Pattern Mining

A few of the commonly used algorithms for mining frequent patterns include the following -

• Apriori - Apriori is a classic algorithm for mining frequent patterns in large datasets. It works by iteratively
generating candidate itemsets of increasing size and pruning those that do not meet the minimum
support threshold. This approach significantly reduces the search space and makes it possible to handle
datasets with a large number of items. However, Apriori can be computationally expensive for datasets
with many infrequent itemsets.
• FP-growth -FP-growth is an algorithm for mining frequent patterns that uses a divide-and-conquer
approach. It constructs a tree-like data structure called the frequent pattern (FP) tree, where each node
represents an item in a frequent pattern, and its children represent its immediate sub-patterns. By
scanning the dataset only twice, FP-growth can efficiently mine all frequent itemsets without generating
candidate itemsets explicitly. It is particularly suitable for datasets with long patterns and relatively low
support thresholds.
• Eclat - Eclat is a depth-first search algorithm for mining frequent itemsets similar to Apriori. However,
instead of generating candidate itemsets of increasing size, Eclat uses a vertical representation of the
dataset to identify frequent itemsets recursively. It exploits the overlap among the itemsets in different
transactions to reduce the search space and is efficient for datasets with many short and frequent
itemsets. However, Eclat may perform poorly for datasets with long itemsets or low support thresholds.

In the subsequent sections, let’s understand various terminologies used in mining frequent patterns.
UNIT 3 DATA MINING -FREQUENT PATTERN ANALYSIS
Support

In data mining, support is a measure used to identify frequent patterns in a dataset. It is the proportion of
transactions or records in the dataset that contain a given set of items or attributes. The support value is
typically expressed as a percentage or decimal value between 0 and 1.

For example, consider a dataset of customer transactions at a grocery store that contains the following items -
milk, bread, cheese, eggs, butter, and yogurt. Suppose we want to find frequent itemsets of products commonly
purchased together. If we set a minimum support threshold of 30%, we would only consider itemsets that
appear in at least 30% of the transactions in the dataset. To calculate the support of an itemset, we count the
number of transactions in which it appears and divide it by the total number of transactions in the dataset. For
instance, if the itemset {bread, eggs} appears in 5 out of 10 transactions in the dataset, then its support
is 5/10=0.55/10=0.5, or 50%. As support for the {bread, eggs} is higher than the defined threshold of 30%, it will
be considered a frequent itemset.

Support(A)=Number of Transactions in which A occurs/ Number of all TransactionsNumber

Confidence

In data mining, confidence is a measure used to determine the strength of association between two items in a
frequent pattern. It is the conditional probability that item Y appears in a transaction, given that item X also
appears in the same transaction.

For example, suppose we have a dataset of customer transactions at a grocery store. We can calculate the
confidence of an association rule, such as {bread, milk} -> {eggs}, which means that customers who buy bread
and milk are likely to also buy eggs.

The confidence of an association rule is calculated as the support of the combined itemset divided by the
support of the antecedent (left-hand side) itemset. In other words, it measures the proportion of transactions
that contain both the antecedent and consequent itemsets out of the transactions that contain the antecedent
itemset. The formula for calculating support is shown below -

confidence(A⇒B)=P(B/A)=sup(A)sup(A∪B)

Applications of Frequent Pattern Mining

Here are some applications of frequent pattern mining in bullet points -

• Market basket analysis - Identifying frequently co-occurring products in a customer's basket or

transaction history.
• Recommendation systems - Generating recommendations based on patterns of behavior or purchases.
• Cross-selling and up-selling - Identifying related products to recommend or suggest to customers.
• Fraud detection - Identifying patterns of fraudulent behavior or transactions.
• Web usage mining - Analyzing user behavior and navigation patterns on a website.
• Social network analysis - Identifying common patterns of connections and relationships between
individuals or groups.
• Healthcare - Analyzing patient data and identifying common patterns or risk factors.
• Quality control - Analyzing production data and identifying patterns of defects or errors.
UNIT 3 DATA MINING -FREQUENT PATTERN ANALYSIS
Association and Correlation in Data Mining

Association and Correlation in Data Mining are two of the most widely used techniques. They are used to
identify patterns and relationships between variables in a dataset. Association refers to the discovery of co-
occurrences or relationships between items in a dataset. On the other hand, correlation measures the strength
of the relationship between two variables. It provides an insight into how the variables are related and how they
affect each other.

Introduction

Data mining is the process of extracting useful information and knowledge from large datasets. With the ever-
increasing amount of data generated in various domains, data mining has become crucial for organizations to
make informed decisions. Association and Correlation in data mining are two of the most commonly used
techniques that help identify patterns, trends, and relationships between variables. Association analysis is used
to discover co-occurrences or relationships between items in a dataset, while correlation analysis measures the
strength of the relationship between two variables. In the subsequent sections, let’s explore both techniques,
their types, and various algorithms/methods to implement them.

What is Association?

Association is a technique used in data mining to identify the relationships or co-occurrences between items in
a dataset. It involves analyzing large datasets to discover patterns or associations between items, such as
products purchased together in a supermarket or web pages frequently visited together on a website.
Association analysis is based on the idea of finding the most frequent patterns or itemsets in a dataset, where an
itemset is a collection of one or more items.

Association analysis can provide valuable insights into consumer behaviour and preferences. It can help
retailers identify the items that are frequently purchased together, which can be used to optimize product
placement and promotions. Similarly, it can help e-commerce websites recommend related products to
customers based on their purchase history.

Types of Associations

Here are the most common types of associations used in data mining:

• Itemset Associations: Itemset association is the most common type of association analysis, which is
used to discover relationships between items in a dataset. In this type of association, a collection of one
or more items that frequently co-occur together is called an itemset. For example, in a supermarket
dataset, itemset association can be used to identify items that are frequently purchased together, such
as bread and butter.
• Sequential Associations: Sequential association is used to identify patterns that occur in a specific
sequence or order. This type of association analysis is commonly used in applications such as analyzing
customer behaviour on e-commerce websites or studying weblogs. For example, in the weblogs dataset,
a sequential association can be used to identify the sequence of pages that users visit before making a
purchase.
• Graph-based Associations Graph-based association is a type of association analysis that involves
representing the relationships between items in a dataset as a graph. In this type of association, each
item is represented as a node in the graph, and the edges between nodes represent the co-occurrence or
UNIT 3 DATA MINING -FREQUENT PATTERN ANALYSIS
relationship between items. The graph-based association is used in various applications, such as social
network analysis, recommendation systems, and fraud detection. For example, in a social network
dataset, identifying groups of users with similar interests or behaviours.

Association Rule Mining

Here are the most commonly used algorithms to implement association rule mining in data mining:

• Apriori Algorithm - Apriori is one of the most widely used algorithms for association rule mining. It
generates frequent item sets from a given dataset by pruning infrequent item sets iteratively. The Apriori
algorithm is based on the concept that if an item set is frequent, then all of its subsets must also be
frequent. The algorithm first identifies the frequent items in the dataset, then generates candidate
itemsets of length two from the frequent items, and so on until no more frequent itemsets can be
generated. The Apriori algorithm is computationally expensive, especially for large datasets with many
items.
• FP-Growth Algorithm - FP-Growth is another popular algorithm for association rule mining that is based
on the concept of frequent pattern growth. It is faster than the Apriori algorithm, especially for large
datasets. The FP-Growth algorithm builds a compact representation of the dataset called a frequent
pattern tree (FP-tree), which is used to mine frequent item sets. The algorithm scans the dataset only
twice, first to build the FP-tree and then to mine the frequent itemsets. The FP-Growth algorithm can
handle datasets with both discrete and continuous attributes.
• Eclat Algorithm - Eclat (Equivalence Class Clustering and Bottom-up Lattice Traversal) is a frequent
itemset mining algorithm based on the vertical data format. The algorithm first converts the dataset into a
vertical data format, where each item and the transaction ID in which it appears are stored. Eclat then
performs a depth-first search on a tree-like structure, representing the dataset's frequent itemsets. The
algorithm is efficient regarding both memory usage and runtime, especially for sparse datasets.

Correlation Analysis in Data Mining

Correlation Analysis is a data mining technique used to identify the degree to which two or more variables are
related or associated with each other. Correlation refers to the statistical relationship between two or more
variables, where the variation in one variable is associated with the variation in another variable. In other words,
it measures how changes in one variable are related to changes in another variable. Correlation can
be positive, negative, or zero, depending on the direction and strength of the relationship between the variables.

, For example,, we are studying the relationship between the hours of study and the grades obtained by
students. If we find that as the number of hours of study increases, the grades obtained also increase, then there
is a positive correlation between the two variables. On the other hand, if we find that as the number of hours of
study increases, the grades obtained decrease, then there is a negative correlation between the two variables. If
there is no relationship between the two variables, we would say that there is zero correlation.
UNIT 3 DATA MINING -FREQUENT PATTERN ANALYSIS

Why is Correlation Analysis Important?

Correlation analysis is important because it allows us to measure the strength and direction of the relationship
between two or more variables. This information can help identify patterns and trends in the data, make
predictions, and select relevant variables for analysis. By understanding the relationships between different
variables, we can gain valuable insights into complex systems and make informed decisions based on data-
driven analysis.

Types of Correlation Analysis in Data Mining

There are three main types of correlation analysis used in data mining, as mentioned below:

• Pearson Correlation Coefficient - Pearson correlation measures the linear relationship between two
continuous variables. It ranges from -1 to +1, where -1 indicates a perfect negative correlation, 0 indicates
no correlation, and +1 indicates a perfect positive correlation. The Pearson correlation coefficient
between two variables, X and Y, is calculated as follows .

• Kendall Rank Correlation - Kendall correlation is a non-parametric measure of the association between
two ordinal variables. It measures the degree of correspondence between the ranking of observations on
two variables. It calculates the difference between the number of concordant pairs (pairs of observations
that have the same rank order in both variables) and discordant pairs (pairs of observations that have an
opposite rank order in the two variables) and normalizes the result by dividing by the total number of
pairs. The formula for the Kendall correlation is -

• Spearman Rank Correlation - Spearman correlation is another non-parametric measure of the

relationship between two variables. It measures the degree of association between the ranks of two
variables. Spearman correlation is similar to the Kendall correlation in that it measures the strength of
the relationship between two variables measured on a ranked scale. However, Spearman correlation
uses the actual numerical ranks of the data instead of counting the number of concordant and discordant
pairs. The formula for Spearman correlation is -

Interpreting Results Of Correlation Analysis

UNIT 3 DATA MINING -FREQUENT PATTERN ANALYSIS
After performing a correlation analysis, it is important to interpret the results to draw meaningful conclusions
about the relationship between the analyzed variables. One common way to interpret correlation coefficients is
by using the following general guidelines -

• Any score from +0.5 to +1 indicates a very strong positive correlation, meaning that the variables are
strongly related in a positive direction, increasing together or simultaneously.
• Any score from -0.5 to -1 indicates a strong negative correlation, meaning that the variables are strongly
related in a negative direction. It also means that as one variable decreases, the other variable increases
and vice-versa.
• A score of 0 indicates no correlation, meaning there is no relationship between the analyzed variables.

Benefits of Correlation Analysis

Correlation analysis is a powerful tool in data mining and statistical analysis that offers several benefits.

Some of the main benefits of correlation analysis are:

• Identifying Relationships - Correlation analysis helps identify the relationships between different
variables in a dataset. By quantifying the degree and direction of the relationship, we can gain insights into
how changes in one variable are likely to affect the other.
• Prediction - Correlation analysis can help predict one variable's values based on another variable's
values. Building models based on correlations can predict future outcomes and make informed
decisions.
• Feature Selection - Correlation analysis can also help select the most relevant features for a particular
analysis or model. By identifying the features that are highly correlated with the outcome features, we can
focus on those features and exclude the irrelevant ones, improving the accuracy and efficiency of the
analysis or model.
• Quality Control - Correlation analysis is useful in quality control applications, where it can be used to
identify correlations between different process variables and identify potential sources of quality
problems.

Benefits of Correlation Analysis

Correlation analysis is a powerful tool in data mining and statistical analysis that offers several benefits.

Some of the main benefits of correlation analysis are:

UNIT 3 DATA MINING -FREQUENT PATTERN ANALYSIS
• Identifying Relationships - Correlation analysis helps identify the relationships between different
variables in a dataset. By quantifying the degree and direction of the relationship, we can gain insights into
how changes in one variable are likely to affect the other.
• Prediction - Correlation analysis can help predict one variable's values based on another variable's
values. Building models based on correlations can predict future outcomes and make informed
decisions.
• Feature Selection - Correlation analysis can also help select the most relevant features for a particular
analysis or model. By identifying the features that are highly correlated with the outcome features, we can
focus on those features and exclude the irrelevant ones, improving the accuracy and efficiency of the
analysis or model.
• Quality Control - Correlation analysis is useful in quality control applications, where it can be used to
identify correlations between different process variables and identify potential sources of quality
problems.

Introduction

Data mining techniques are used to extract useful knowledge and insights from large datasets. A good data
mining technique should have the following characteristics -

• Scalability -
The technique should be able to handle large amounts of data efficiently.
• Robustness -
The technique should be able to handle noisy or incomplete data without compromising the quality of the
results.
• Accuracy -
The technique should produce accurate results with a low error rate.
• Interpretability -
The technique should produce results that domain experts can easily understand and interpret.

Data mining techniques are important because they enable organizations to discover hidden patterns,
relationships, and insights in their data. These insights can be used to make informed decisions, improve
business processes, and identify new opportunities. Data mining techniques are widely used in fields such
as marketing, finance, healthcare, and scientific research.

Data Mining Techniques

Classification

Classification is a supervised learning technique in data mining that assigns predefined classes to objects or
instances based on their attributes or features. It involves building a model from a set of training data that
consists of labeled examples, where the class label of each example is known. The model is then used to
classify new, unseen data based on their attributes.
UNIT 3 DATA MINING -FREQUENT PATTERN ANALYSIS
For example, consider a bank that wants to identify customers who are likely to default on their loans. The bank
can use classification to build a model that predicts the default risk of a customer based on their credit score,
income, and other relevant factors. The model can then be used to classify new loan applicants as low or high-
risk.

Classification algorithms used in data mining include decision trees, naive Bayes, support vector machines
(SVM), and logistic regression, among others. These algorithms differ in their assumptions, strengths, and
weaknesses and are chosen based on the characteristics of the data and the problem being solved.

Clustering

Clustering is an unsupervised learning technique in data mining that involves grouping similar objects or
instances together based on their attributes or features. Unlike classification, clustering does not involve
predefined classes but rather groups objects based on their similarity. The objective of clustering is to discover
inherent patterns and structures in the data that may not be immediately apparent.

For example, consider a retailer that wants to segment its customers based on their shopping behavior. The
retailer can use clustering to group customers with similar purchasing patterns, such as those who buy high-end
products or shop frequently. This information can be used to tailor marketing strategies and promotions to each
segment.

Clustering algorithms used in data mining include k-means, hierarchical clustering, and density-based
clustering, among others. These algorithms differ in their assumptions and how they define similarity or distance
between objects.
UNIT 3 DATA MINING -FREQUENT PATTERN ANALYSIS

Regression

Regression is a supervised learning technique in data mining that involves building a model to predict a
continuous or numerical output variable based on one or more input variables or predictors. Regression aims to
establish a functional relationship between the input and output variables.

For example, consider a real estate agency that wants to predict the price of a house based on its features,
such as size, location, and the number of bedrooms. The agency can use regression to build a model that
predicts the price of a house based on these features. The model can then be used to estimate the price of new
houses or to identify undervalued properties.

Regression algorithms used in data mining include linear regression, polynomial regression, and decision tree
regression, among others. These algorithms differ in their assumptions and how they model the relationship
between the input and output variables.
UNIT 3 DATA MINING -FREQUENT PATTERN ANALYSIS
Association Rules Mining

Association rule mining is an unsupervised learning technique in data mining that involves discovering
relationships or associations between variables in a dataset. It aims to find patterns of co-occurrence or
correlation among variables frequently occurring together in the data.

For example, consider a retailer that wants to increase its sales by offering promotions or discounts to
customers who buy certain products. The retailer can use association rule mining to identify which products are
often bought together, such as bread and butter or shampoo and conditioner. This information can be used to
create targeted promotions and cross-selling strategies.

Association rule mining algorithms used in data mining include Apriori, FP-Growth, and Eclat, among others.
These algorithms differ in their approach to identifying frequent itemsets or sets of variables that occur together.

Outlier Detection

Outlier detection is a data mining technique that involves identifying and analyzing data points or observations
significantly different from most of the data. Outliers are data points that deviate from the expected or normal
behavior of the data and may indicate errors, anomalies, or rare events.

For example, consider a credit card company that wants to detect fraudulent transactions. The company can
use outlier detection to identify transactions significantly different from a customer's normal spending behavior,
such as unusually large purchases or transactions made in different countries. These transactions can be
flagged for further investigation or declined to prevent fraud.

Outlier detection algorithms used in data mining include statistical methods, such as z-score and boxplot, and
machine learning methods, such as isolation forest and LOF.

Sequential Patterns

Sequential pattern mining is a data mining technique that involves discovering patterns or sequences of events
that frequently occur together in a dataset. It aims to identify temporal or time-dependent relationships between
variables or events.
UNIT 3 DATA MINING -FREQUENT PATTERN ANALYSIS
For example, consider an e-commerce company that wants to improve its user experience by recommending
products based on the purchase behavior of its users. The company can employ sequential pattern mining to
identify which products are often purchased together in a sequence, such as a user buying accessories post-
purchase of a computer or smartphone. This information can be used to personalize recommendations and
improve user engagement.

Sequential pattern mining algorithms used in data mining include GSP (Generalized Sequential Pattern), SPADE
(Sequential PAttern Discovery using Equivalence classes), and PrefixSpan, among others.

Prediction

Prediction is a data mining technique that involves building a model to predict the value or class of a target
variable based on a set of input or predictor variables. The objective of prediction is to make accurate predictions
for new or unseen data based on the patterns and relationships discovered in the training data.

Prediction algorithms used in data mining include linear regression, decision trees, neural networks, support
vector machines, and random forests, among others. These algorithms differ in their approach to building
prediction models and are based on the data type of the variable (categorical or continuous) to be predicted.
One can choose the appropriate algorithm for the prediction model based on the characteristics of the data and
the problem being solved.

Advantages and Disadvantages

Data mining techniques have several advantages, which are as follows -

• Identification of patterns and trends -

Data mining techniques help identify patterns and trends in large datasets, providing valuable insights for
decision-making.
• Automated processing -
Data mining techniques automate the process of analyzing data, reducing the time and effort required for
manual analysis.
• Prediction and forecasting -
Data mining techniques can be used to build predictive models that help forecast future trends and
events.
• Improved decision-making -
Data mining techniques provide valuable information and insights that help make informed decisions.
• Increased efficiency -
Data mining techniques help identify areas of inefficiency or waste, allowing organizations to streamline
their operations and improve efficiency.
• Personalization -
Data mining techniques can help personalize customer recommendations and experiences based on
their preferences and behaviors.

Data mining techniques also have several disadvantages, as mentioned below -

• Privacy concerns -
Data mining techniques can be used to extract sensitive information about individuals, which can raise
privacy concerns.
UNIT 3 DATA MINING -FREQUENT PATTERN ANALYSIS
• Reliance on data quality -
Data mining techniques rely on the quality and accuracy of the data, and inaccurate or incomplete data
can lead to incorrect conclusions.

Pattern Evaluation methods

In data mining, pattern evaluation is the process of assessing the quality of discovered patterns. This process
is important in order to determine whether the patterns are useful and whether they can be trusted. There are
a number of different measures that can be used to evaluate patterns, and the choice of measure will depend
on the application.
There are several ways to evaluate pattern mining algorithms:
1. Accuracy
The accuracy of a data mining model is a measure of how correctly the model predicts the target values. The
accuracy is measured on a test dataset, which is separate from the training dataset that was used to train the
model. There are a number of ways to measure accuracy, but the most common is to calculate the
percentage of correct predictions. This is known as the accuracy rate.
Other measures of accuracy include the root mean squared error (RMSE) and the mean absolute error (MAE).
The RMSE is the square root of the mean squared error, and the MAE is the mean of the absolute errors. The
accuracy of a data mining model is important, but it is not the only thing that should be considered. The model
should also be robust and generalizable.
A model that is 100% accurate on the training data but only 50% accurate on the test data is not a good
model. The model is overfitting the training data and is not generalizable to new data. A model that is 80%
accurate on the training data and 80% accurate on the test data is a good model. The model is generalizable
and can be used to make predictions on new data.
2. Classification Accuracy
This measures how accurately the patterns discovered by the algorithm can be used to classify new data. This
is typically done by taking a set of data that has been labeled with known class labels and then using the
discovered patterns to predict the class labels of the data. The accuracy can then be computed by comparing
the predicted labels to the actual labels.
Classification accuracy is one of the most popular evaluation metrics for classification models, and it is
simply the percentage of correct predictions made by the model. Although it is a straightforward and easy -to-
understand metric, classification accuracy can be misleading in certain situations. For example, if we have a
dataset with a very imbalanced class distribution, such as 100 instances of class 0 and 1,000 instances of
class 1, then a model that always predicts class 1 will achieve a high classification accuracy of 90%.
However, this model is clearly not very useful, since it is not making any correct predictions for class 0.
There are a few different ways to evaluate classification models, such as precision and recall, which are more
informative in imbalanced datasets. Precision is the percentage of correct predictions made by the model for
a particular class, and recall is the percentage of instances of a particular class that was correctly predicted
by the model. In the above example, if we looked at precision and recall for class 0, we would see that the
model has a precision of 0% and a recall of 0%.
Another way to evaluate classification models is to use a confusion matrix. A confusion matrix is a table that
shows the number of correct and incorrect predictions made by the model for each class. This can be a
helpful way to visualize the performance of a model and to identify where it is making mistakes. For example,
in the above example, the confusion matrix would show that the model is making all predictions for class 1
and no predictions for class 0.
UNIT 3 DATA MINING -FREQUENT PATTERN ANALYSIS
Overall, classification accuracy is a good metric to use when evaluating classification models. However, it is
important to be aware of its limitations and to use other evaluation metrics in situations where classification
accuracy could be misleading.
3. Clustering Accuracy
This measures how accurately the patterns discovered by the algorithm can be used to cluster new data. This
is typically done by taking a set of data that has been labeled with known cluster labels and then using the
discovered patterns to predict the cluster labels of the data. The accuracy can then be computed by
comparing the predicted labels to the actual labels.
There are a few ways to evaluate the accuracy of a clustering algorithm:
• External indices: these indices compare the clusters produced by the algorithm to some known
ground truth. For example, the Rand Index or the Jaccard coefficient can be used if the ground truth
is known.
• Internal indices: these indices assess the goodness of clustering without reference to any external
information. The most popular internal index is the Dunn index.
• Stability: this measures how robust the clustering is to small changes in the data. A clustering
algorithm is said to be stable if, when applied to different samples of the same data, it produces the
same results.
• Efficiency: this measures how quickly the algorithm converges to the correct clustering.
4. Coverage
This measures how many of the possible patterns in the data are discovered by the algorithm. This can be
computed by taking the total number of possible patterns and dividing it by the number of patterns discovered
by the algorithm. A Coverage Pattern is a type of sequential pattern that is found by looking for items that tend
to appear together in sequential order. For example, a coverage pattern might be “customers who purchase
item A also tend to purchase item B within the next month.”
To evaluate a coverage pattern, analysts typically look at two things: support and confidence. Support is the
percentage of transactions that contain the pattern. Confidence is the percentage of transactions that contain
the pattern divided by the number of transactions that contain the first item in the pattern.
For example, consider the following coverage pattern: “customers who purchase item A also tend to
purchase item B within the next month.” If the support for this pattern is 0.1%, that means that 0.1% of all
transactions contain the pattern. If the confidence for this pattern is 80%, that means that 80% of the
transactions that contain item A also contain item B.
Generally, a higher support and confidence value indicates a stronger pattern. However, analysts must be
careful to avoid overfitting, which is when a pattern is found that is too specific to the data and would not be
generalizable to other data sets.
5. Visual Inspection
This is perhaps the most common method, where the data miner simply looks at the patterns to see if they
make sense. In visual inspection, the data is plotted in a graphical format and the pattern is observed. This
method is used when the data is not too large and can be easily plotted. It is also used when the data is
categorical in nature. Visual inspection is a pattern evaluation method in data mining where the data is visually
inspected for patterns. This can be done by looking at a graph or plot of the data, or by looking at the raw data
itself. This method is often used to find outliers or unusual patterns.
6. Running Time
This measures how long it takes for the algorithm to find the patterns in the data. This is typically measured in
seconds or minutes. There are a few different ways to measure the performance of a machine learning
UNIT 3 DATA MINING -FREQUENT PATTERN ANALYSIS
algorithm, but one of the most common is to simply measure the amount of time it takes to train the model
and make predictions. This is known as the running time pattern evaluation.
There are a few different things to keep in mind when measuring the running time of an algorithm. First, you
need to take into account the time it takes to load the data into memory. Second, you need to account for the
time it takes to pre-process the data if any. Finally, you need to account for the time it takes to train the model
and make predictions.
In general, the running time of an algorithm will increase as the number of data increases. This is because the
algorithm has to process more data in order to learn from it. However, there are some algorithms that are
more efficient than others and can scale to large datasets better. When comparing different algorithms, it is
important to keep in mind the specific dataset that is being used. Some algorithms may be better suited for
certain types of data than others. In addition, the running time can also be affected by the hardware that is
being used.
7. Support
The support of a pattern is the percentage of the total number of records that contain the pattern. Support
Pattern evaluation is a process of finding interesting and potentially useful patterns in data. The purpose of
support pattern evaluation is to identify interesting patterns that may be useful for decision -making. Support
pattern evaluation is typically used in data mining and machine learning applications.
There are a variety of ways to evaluate support patterns. One common approach is to use a support metric,
which measures the number of times a pattern occurs in a dataset. Another common approach is to use a lift
metric, which measures the ratio of the occurrence of a pattern to the expected occurrence of the pattern.
Support pattern evaluation can be used to find a variety of interesting patterns in data, including association
rules, sequential patterns, and co-occurrence patterns. Support pattern evaluation is an important part of
data mining and machine learning, and can be used to help make better decisions.
8. Confidence
The confidence of a pattern is the percentage of times that the pattern is found to be correct. Confidence
Pattern evaluation is a method of data mining that is used to assess the quality of patterns found in data. This
evaluation is typically performed by calculating the percentage of times a pattern is found in a data set and
comparing this percentage to the percentage of times the pattern is expected to be found based on the overall
distribution of data. If the percentage of times a pattern is found is significantly higher than the expected
percentage, then the pattern is said to be a strong confidence pattern.
9. Lift
The lift of a pattern is the ratio of the number of times that the pattern is found to be correct to the number of
times that the pattern is expected to be correct. Lift Pattern evaluation is a data mining technique that can be
used to evaluate the performance of a predictive model. The lift pattern is a graphical representation of the
model’s performance and can be used to identify potential problems with the model.
The lift pattern is a plot of the true positive rate (TPR) against the false positive rate (FPR). The TPR is the
percentage of positive instances that are correctly classified by the model, while the FPR is the percentage of
negative instances that are incorrectly classified as positive. Ideally, the TPR would be 100% and the FPR
would be 0%, but this is rarely the case in practice. The lift pattern can be used to evaluate how close the
model is to this ideal.
A good model will have a lifted pattern that is close to the diagonal line. This means that the TPR and FPR are
similar and that the model is correctly classifying a similar percentage of positive and negative instances. A
UNIT 3 DATA MINING -FREQUENT PATTERN ANALYSIS
model with a lifted pattern that is far from the diagonal line is not performing as well. This can be caused by a
number of factors, including imbalanced data, poor feature selection, or overfitting.
The lift pattern can be a useful tool for identifying potential problems with a predictive model. It is important to
remember, however, that the lift pattern is only a graphical representation of the model’s performance, and
should be interpreted in conjunction with other evaluation measures.
10. Prediction
The prediction of a pattern is the percentage of times that the pattern is found to be correct. Prediction Pattern
evaluation is a data mining technique used to assess the accuracy of predictive models. It is used to
determine how well a model can predict future outcomes based on past data. Prediction Pattern evaluation
can be used to compare different models, or to evaluate the performance of a single model.
Prediction Pattern evaluation involves splitting the data set into two parts: a training set and a test set. The
training set is used to train the model, while the test set is used to assess the accuracy of the model. To
evaluate the accuracy of the model, the prediction error is calculated. Prediction Pattern evaluation can be
used to improve the accuracy of predictive models. By using a test set, predictive models can be fine -tuned to
better fit the data. This can be done by changing the model parameters or by adding new features to the data
set.
11. Precision
Precision Pattern Evaluation is a method for analyzing data that has been collected from a variety of sources.
This method can be used to identify patterns and trends in the data, and to evaluate the accuracy of data.
Precision Pattern Evaluation can be used to identify errors in the data, and to determine the cause of the
errors. This method can also be used to determine the impact of the errors on the overall accuracy of the data.

Precision Pattern Evaluation is a valuable tool for data mining and data analysis. This method can be used to
improve the accuracy of data, and to identify patterns and trends in the data.
12. Cross-Validation
This method involves partitioning the data into two sets, training the model on one set, and then testing it on
the other. This can be done multiple times, with different partitions, to get a more reliable estimate of the
model’s performance. Cross-validation is a model validation technique for assessing how the results of a data
mining analysis will generalize to an independent data set. It is mainly used in settings where the goal is
prediction, and one wants to estimate how accurately a predictive model will perform in practice. Cross -
validation is also referred to as out-of-sample testing.
Cross-validation is a pattern evaluation method that is used to assess the accuracy of a model. It does this by
splitting the data into a training set and a test set. The model is then fit on the training set and the accuracy is
measured on the test set. This process is then repeated a number of times, with the accuracy being averaged
over all the iterations.
13. Test Set
This method involves partitioning the data into two sets, training the model on the entire data set, and then
testing it on the held-out test set. This is more reliable than cross-validation but can be more expensive if the
data set is large. There are a number of ways to evaluate the performance of a model on a test set. The most
common is to simply compare the predicted labels to the true labels and compute the percentage of
instances that are correctly classified. This is called accuracy. Another popular metric is precision, which is
the number of true positives divided by the sum of true positives and false positives. The recall is the number
of true positives divided by the sum of true positives and false negatives. These metrics can be combined into
the F1 score, which is the harmonic mean of precision and recall.
UNIT 3 DATA MINING -FREQUENT PATTERN ANALYSIS
14. Bootstrapping
This method involves randomly sampling the data with replacement, training the model on the sampled data,
and then testing it on the original data. This can be used to get a distribution of the model’s performance,
which can be useful for understanding how robust the model is. Bootstrapping is a resampling technique used
to estimate the accuracy of a model. It involves randomly selecting a sample of data from the original dataset
and then training the model on this sample. The model is then tested on another sample of data that is not
used in training. This process is repeated a number of times, and the average accuracy of the model is
calculated.

Apriori Algorithm in Data Mining

Apriori Property

The Apriori property is a fundamental property of frequent itemsets used in the Apriori algorithm. In other words,
if an itemset appears frequently enough in the dataset to be considered significant, then all of its subsets must
also appear frequently enough to be significant. For example, if the itemset {A, B, C} frequently appears in a
dataset, then the subsets {A, B}, {A, C}, {B, C}, {A}, {B}, and {C} must also appear frequently in the dataset.

The Apriori property allows the Apriori algorithm in data mining to efficiently search for frequent itemsets by
eliminating candidate itemsets containing infrequent subsets, as they cannot be frequent. This search space
pruning reduces the time and memory required to find frequent itemsets in large datasets.

Apriori Algorithm Components

Before getting into the steps involved in the Apriori algorithm, let’s understand the various terminologies used in
the Apriori algorithm.

Support

In the Apriori algorithm, support refers to the frequency or occurrence of an item set in a dataset. It is defined as
the proportion of transactions in the dataset that contain the itemset. For example, let's consider a dataset of
sales transactions in a retail store that contains the following items - milk, bread, cheese, eggs, butter, and
yogurt. To calculate the support of an itemset, we count the number of transactions in which the itemset
appears and divide it by the total number of transactions in the dataset. For instance, if the itemset {milk, bread}
appears in 5 transactions out of 10 transactions in the dataset, then its support is 5/10=0.55/10=0.5, or 50%.

In the Apriori algorithm, itemsets with a support value above the minimum defined support threshold are
considered frequent and are used to generate candidate itemsets for the next iteration of the algorithm.

Support(A)=Number of Transactions in which A occurs// /

Number of all TransactionsNumber
Lift

Lift measures the strength of the association between two items. It is defined as the ratio of the support of the
two items occurring together to the support of the individual items multiplied together. Lift for any two items can
be calculated using the below formula -
UNIT 3 DATA MINING -FREQUENT PATTERN ANALYSIS
Lift(A→B)= Support(A and B)/ Support(A)∗Support(B)

If the lift value is greater than 1, then it indicates a positive association between the two items, which means that
the two items are more likely to be bought together. A lift value of exactly 1 indicates that the two items are
independent and there is no association between the two items, while a value less than 1 indicates a negative
association, meaning that two items are more likely to be bought separately.

Confidence

In the Apriori algorithm, confidence is also a measure of the strength of the association between two items in an
itemset. It is defined as the conditional probability that item B appears in a transaction, given that another item A
appears in the same transaction. Support for two items can be calculated using the below formula.

confidence(A⇒B)=P(B/A)= sup(A∪B)/// sup(A)

If the confidence value exceeds a specified threshold, it indicates that item B is likely to be purchased with item
A. For instance, if the confidence of the association between "bread" and "butter" is 0.8, it means that when a
customer buys "bread", there is an 80% chance that they will also buy "butter". This can be useful in
recommending to customers or optimizing product placement in a store.

Steps in Apriori Algorithm

Here are the steps involved in implementing the Apriori algorithm in data mining -

1. Define minimum support threshold - This is the minimum number of times an item set must appear in
the dataset to be considered as frequent. The support threshold is usually set by the user based on the
size of the dataset and the domain knowledge.
2. Generate a list of frequent 1-item sets - Scan the entire dataset to identify the items that meet the
minimum support threshold. These item sets are known as frequent 1-item sets.
3. Generate candidate item sets - In this step, the algorithm generates a list of candidate item sets of
length k+1 from the frequent k-item sets identified in the previous step.
4. Count the support of each candidate item set - Scan the dataset again to count the number of times
each candidate item set appears in the dataset.
5. Prune the candidate item sets - Remove the item sets that do not meet the minimum support threshold.
6. Repeat steps 3-5 until no more frequent item sets can be generated.
7. Generate association rules - Once the frequent item sets have been identified, the algorithm generates
association rules from them. Association rules are rules of form A -> B, where A and B are item sets. The
rule indicates that if a transaction contains A, it is also likely to contain B.
8. Evaluate the association rules - Finally, the association rules are evaluated based on metrics such as
confidence and lift.
UNIT 3 DATA MINING -FREQUENT PATTERN ANALYSIS

Apriori Algorithm Example

Let’s try to understand the Apriori algorithm implementation using an example. In this example, we will use a
minimum support threshold of 3. This means an item set must appear in at least three transactions to be
considered frequent.

• Let’s consider the transaction dataset of a retail store as shown in the below table.

TID Items

T1 {milk, bread}

T2 {bread, sugar}

T3 {bread, butter}

T4 {milk, bread, sugar}

T5 {milk, bread, butter}

T6 {milk, bread, butter}

T7 {milk, sugar}

T8 {milk, sugar}

T9 {sugar, butter}

T10 {milk, sugar, butter}

UNIT 3 DATA MINING -FREQUENT PATTERN ANALYSIS
TID Items

T11 {milk, bread, butter}

• Let’s calculate support for each item present in the dataset. As shown in the below table, support for all
items is greater than 3. It means that all items are considered as frequent 1-itemsets and will be used to
generate candidates for 2-itemsets.

Item Support (Frequency)

milk 8

bread 7

sugar 5

butter 7

• Below table represents all candidates generated from frequent 1-itemsets identified from the previous
step and their support value.

Candidate Item Sets Support (Frequency)

{milk, bread} 5

{milk, sugar} 3

{milk, butter} 5

{bread, sugar} 2

{bread, butter} 3

{sugar, butter} 2

• Now remove candidate item sets that do not meet the minimum support threshold of 3. After this step,
frequent 2-itemsets would be - {milk, bread}, {milk, sugar}, {milk, butter}, and {bread, butter}. In the next
step, let’s generate candidates for 3-itemsets and calculate their respective support values. It is shown
in the below table.

Candidate Item Sets Support (Frequency)

{milk, bread, sugar} 1

{milk, bread, butter} 3

{milk, sugar, butter} 1

UNIT 3 DATA MINING -FREQUENT PATTERN ANALYSIS
• As we can see in the above table, only one candidate itemset exceeds the minimum defined support
threshold - {milk, bread, butter}. As there is only one 3-itemset exceeding minimum support, we can’t
generate candidates for 4-itemsets. So, in the next step, we can write the association rules and their
respective metrics, as shown in the below table.

Candidate Item Sets Support (Frequency)

{milk, bread} {butter} (Confidence - 60%)1

{bread, butter} {milk} (Confidence - 100%)

{milk, butter} {bread} (Confidence - 60%)

• Based on association rules mentioned in the above table, we can recommend products to the customer
or optimize product placement in retail stores.

Advantages and Limitations of Apriori Algorithm

Here are some of the advantages of the Apriori algorithm in data mining -

• Apriori algorithm is simple and easy to implement, making it accessible even to those without a deep
understanding of data mining or machine learning.
• Apriori algorithm can handle large datasets and run on distributed systems, making it scalable for large-
scale applications.
• Apriori algorithm is one of the most widely used algorithms for association rule mining and is supported
by many popular data mining tools.

Below are some of the limitations of the Apriori algorithm in data mining -

• Apriori algorithm can be computationally expensive, especially for large datasets with many itemsets. For
example, if a dataset contains 104104 from frequent 1- itemsets, it will generate more than 107107 2-
length candidates, which makes this algorithm computationally expensive.
• Apriori algorithm can generate a large number of rules, making it difficult to sift through and identify the
most important ones.
• The algorithm requires multiple database scans to generate frequent itemsets, which can be a limitation
in systems where data access is slow or expensive.
• Apriori algorithm is sensitive to data sparsity, meaning it may not perform well on datasets with a low
frequency of itemsets.

FP Growth Algorithm in Data Mining

FP Growth in Data Mining

The FP Growth algorithm is a popular method for frequent pattern mining in data mining. It works by
constructing a frequent pattern tree (FP-tree) from the input dataset. The FP-tree is a compressed
representation of the dataset that captures the frequency and association information of the items in the data.
UNIT 3 DATA MINING -FREQUENT PATTERN ANALYSIS
The algorithm first scans the dataset and maps each transaction to a path in the tree. Items are ordered in each
transaction based on their frequency, with the most frequent items appearing first. Once the FP tree is
constructed, frequent itemsets can be generated by recursively mining the tree. This is done by starting at the
bottom of the tree and working upwards, finding all combinations of itemsets that satisfy the minimum support
threshold.

The FP Growth algorithm in data mining has several advantages over other frequent pattern mining algorithms,
such as Apriori. The Apriori algorithm is not suitable for handling large datasets because it generates a large
number of candidates and requires multiple scans of the database to my frequent items. In comparison, the FP
Growth algorithm requires only a single scan of the data and a small amount of memory to construct the FP tree.
It can also be parallelized to improve performance.

Working on FP Growth Algorithm

The working of the FP Growth algorithm in data mining can be summarized in the following steps:

• Scan the database:

In this step, the algorithm scans the input dataset to determine the frequency of each item. This
determines the order in which items are added to the FP tree, with the most frequent items added first.
• Sort items:
In this step, the items in the dataset are sorted in descending order of frequency. The infrequent items
that do not meet the minimum support threshold are removed from the dataset. This helps to reduce the
dataset's size and improve the algorithm's efficiency.
• Construct the FP-tree:
In this step, the FP-tree is constructed. The FP-tree is a compact data structure that stores the frequent
itemsets and their support counts.
• Generate frequent itemsets:
Once the FP-tree has been constructed, frequent itemsets can be generated by recursively mining the
tree. Starting at the bottom of the tree, the algorithm finds all combinations of frequent item sets that
satisfy the minimum support threshold.
• Generate association rules:
Once all frequent item sets have been generated, the algorithm post-processes the generated frequent
item sets to generate association rules, which can be used to identify interesting relationships between
the items in the dataset.

FP Tree

The FP-tree (Frequent Pattern tree) is a data structure used in the FP Growth algorithm for frequent pattern
mining. It represents the frequent itemsets in the input dataset compactly and efficiently. The FP tree consists of
the following components:

• Root Node:
The root node of the FP-tree represents an empty set. It has no associated item but a pointer to the first
node of each item in the tree.
• Item Node:
Each item node in the FP-tree represents a unique item in the dataset. It stores the item name and the
frequency count of the item in the dataset.
UNIT 3 DATA MINING -FREQUENT PATTERN ANALYSIS
• Header Table:
The header table lists all the unique items in the dataset, along with their frequency count. It is used to
track each item's location in the FP tree.
• Child Node:
Each child node of an item node represents an item that co-occurs with the item the parent node
represents in at least one transaction in the dataset.
• Node Link:
The node-link is a pointer that connects each item in the header table to the first node of that item in the
FP-tree. It is used to traverse the conditional pattern base of each item during the mining process.

The FP tree is constructed by scanning the input dataset and inserting each transaction into the tree one at a
time. For each transaction, the items are sorted in descending order of frequency count and then added to the
tree in that order. If an item exists in the tree, its frequency count is incremented, and a new path is created from
the existing node. If an item does not exist in the tree, a new node is created for that item, and a new path is
added to the tree. We will understand in detail how FP-tree is constructed in the next section.

Algorithm by Han

Let’s understand with an example how the FP Growth algorithm in data mining can be used to mine frequent
itemsets. Suppose we have a dataset of transactions as shown below:

Transaction ID Items

T1 {M, N, O, E, K, Y}

T2 {D, O, E, N, Y, K}

T3 {K, A, M, E}

T4 {M, C, U, Y, K}

T5 {C, O, K, O, E, I}

Let’s scan the above database and compute the frequency of each item as shown in the below table.

Item Frequency

A 1

C 2

D 1

E 4

I 1

K 5

M 3
UNIT 3 DATA MINING -FREQUENT PATTERN ANALYSIS
N 2

O 3

U 1

Y 3

Let’s consider minimum support as 3. After removing all the items below minimum support in the above table,
we would remain with these items - {K: 5, E: 4, M : 3, O : 3, Y : 3}. Let’s re-order the transaction database based
on the items above minimum support. In this step, in each transaction, we will remove infrequent items and re-
order them in the descending order of their frequency, as shown in the table below.

Transaction ID Items Ordered Itemset

T1 {M, N, O, E, K, Y} {K, E, M, O, Y}

T2 {D, O, E, N, Y, K} {K, E, O, Y}

T3 {K, A, M, E} {K, E, M}

T4 {M, C, U, Y, K} {K, M, Y}

T5 {C, O, K, O, E, I} {K, E, O}

Now we will use the ordered itemset in each transaction to build the FP tree. Each transaction will be inserted
individually to build the FP tree, as shown below -

• First Transaction {K, E, M, O, Y}:

In this transaction, all items are simply linked, and their support count is initialized as 1.

• Second Transaction {K, E, O, Y}:

In this transaction, we will increase the support count of K and E in the tree to 2. As no direct link is
available from E to O, we will insert a new path for O and Y and initialize their support count as 1.
UNIT 3 DATA MINING -FREQUENT PATTERN ANALYSIS

• Third Transaction {K, E, M}:

After inserting this transaction, the tree will look as shown below. We will increase the support count
for K and E to 3 and for M to 2.

• Fourth Transaction {K, M, Y} and Fifth Transaction {K, E, O}:

After inserting the last two transactions, the FP-tree will look like as shown below:
UNIT 3 DATA MINING -FREQUENT PATTERN ANALYSIS
Now we will create a Conditional Pattern Base for all the items. The conditional pattern base is the path in the
tree ending at the given frequent item. For example, for item O, the paths {K, E, M} and {K, E} will result in item O.
The conditional pattern base for all items will look like as shown below table:

Item Conditional Pattern Base

Y {K, E, M, O : 1}, {K, E, O : 1}, {K, M : 1}

O {K, E, M : 1}, {K, E : 2}

M {K, E : 2}, {K : 1}

E {K : 4}

Now for each item, we will build a conditional frequent pattern tree. It is computed by identifying the set of
elements common in all the paths in the conditional pattern base of a given frequent item and computing its
support count by summing the support counts of all the paths in the conditional pattern base. The conditional
frequent pattern tree will look like this as shown below table:

Item Conditional Pattern Base Conditional FP Tree

Y {K, E, M, O : 1}, {K, E, O : 1}, {K, M : 1} {K : 3}

O {K, E, M : 1}, {K, E : 2} {K, E : 3}

M {K, E : 2}, {K: 1} {K : 3}

E {K: 4} {K: 4}

From the above conditional FP tree, we will generate the frequent itemsets as shown in the below table:

Item Frequent Patterns

Y {K, Y - 3}

O {K, O - 3}, {E, O - 3}, {K, E, O - 3}

M {K, M - 3}

E {K, E - 4}

Difference between FP-growth and Aprori algorithm

Here's a tabular comparison between the FP Growth algorithm and the Apriori algorithm:
UNIT 3 DATA MINING -FREQUENT PATTERN ANALYSIS
Factor FP Growth Algorithm Apriori Algorithm

FP Growth uses FP-tree to

Working Apriori algorithm mines frequent items in an iterative manner.
mine.

Generates frequent itemsets by

Candidate constructing the FP-Tree and
Generates candidate itemsets by joining and pruning.
Generation recursively generating
conditional pattern bases.

Scans the database only twice

to construct the FP-Tree and
Data Scanning Scans the database multiple times for frequent itemsets.
generate conditional pattern
bases.

Requires less memory than

Apriori as it constructs the FP-
Memory Usage Requires a large amount of memory to store candidate itemsets.
Tree, which compresses the
database

Faster due to efficient data

Speed compression and generation of Slower due to multiple database scans and candidate generation.
frequent itemsets.

Performs well on large datasets

due to efficient data
Scalability Performs poorly on large datasets due to a large number of candidate
compression and generation of
frequent itemsets.

Advantages of FP Growth Algorithm

The FP Growth algorithm in data mining has several advantages over other frequent itemset mining algorithms,
as mentioned below:

• Efficiency:
FP Growth algorithm is faster and more memory-efficient than other frequent itemset mining algorithms
such as Apriori, especially on large datasets with high dimensionality. This is because it generates
frequent itemsets by constructing the FP-Tree, which compresses the database and requires only two
scans.
• Scalability:
FP Growth algorithm scales well with increasing database size and itemset dimensionality, making it
suitable for mining frequent itemsets in large datasets.
• Resistant to noise:
FP Growth algorithm is more resistant to noise in the data than other frequent itemset mining algorithms,
as it generates only frequent itemsets and ignores infrequent itemsets that may be caused by noise.
• Parallelization:
FP Growth algorithm can be easily parallelized, making it suitable for distributed computing environments
and allowing it to take advantage of multi-core processors.
UNIT 3 DATA MINING -FREQUENT PATTERN ANALYSIS
Disadvantages of FP Growth Algorithm

While the FP Growth algorithm in data mining has several advantages, it also has some limitations and
disadvantages, as mentioned below:

• Memory consumption:
Although the FP Growth algorithm is more memory-efficient than other frequent itemset mining
algorithms, storing the FP-Tree and the conditional pattern bases can still require a significant amount of
memory, especially for large datasets.
• Complex implementation:
The FP Growth algorithm is more complex than other frequent itemset mining algorithms, making it more
difficult to understand and implement.

A data mining technique that is used to uncover purchase patterns in any retail setting is known as Market
Basket Analysis. In simple terms Basically, Market basket analysis in data mining is to analyze the
combination of products which been bought together.
This is a technique that gives the careful study of purchases done by a customer in a supermarket. This
concept identifies the pattern of frequent purchase items by customers. This analysis can help to promote
deals, offers, sale by the companies, and data mining techniques helps to achieve this analysis task. Example:
•Data mining concepts are in use for Sales and marketing to provide better customer service, to
improve cross-selling opportunities, to increase direct mail response rates.
• Customer Retention in the form of pattern identification and prediction of likely defections is
possible by Data mining.
• Risk Assessment and Fraud area also use the data-mining concept for identifying inappropriate or
unusual behavior etc.
Market basket analysis mainly works with the ASSOCIATION RULE {IF} -> {THEN}.
• IF means Antecedent: An antecedent is an item found within the data
• THEN means Consequent: A consequent is an item found in combination with the antecedent.

Let’s see ASSOCIATION RULE {IF} -> {THEN} rules used in Market Basket Analysis in Data Mining. For
example, customers buying a domain means they definitely need extra plugins/extensions to make it easier for
the users.
Like we said above Antecedent is the item sets that are available in data. By formulating from the rules
means {if} component and from the example is the domain.
Same as Consequent is the item that is found with the combination of Antecedents. By formulating from the
rules means {THEN} component and from the example is extra plugins/extensions.
With the help of these, we are able to predict customer behavioral patterns. From this, we are able to make
certain combinations with offers that customers will probably buy those products. That will automatically
increase the sales and revenue of the company.
UNIT 3 DATA MINING -FREQUENT PATTERN ANALYSIS
With the help of the Apriori Algorithm, we can further classify and simplify the item sets which are frequently
bought by the consumer.
There are three components in APRIORI ALGORITHM:
• SUPPORT
• CONFIDENCE
• LIFT
Now take an example, suppose 5000 transactions have been made through a popular eCommerce website.
Now they want to calculate the support, confidence, and lift for the two products, let’s say pen and notebook
for example out of 5000 transactions, 500 transactions for pen, 700 transactions for notebook, and 1000
transactions for both.
SUPPORT: It is been calculated with the number of transactions divided by the total number of transactions
made,

support(pen) = transactions related to pen/total transactions

i.e support -> 500/5000=10 percent
CONFIDENCE: It is been calculated for whether the product sales are popular on individual sales or through
combined sales. That is calculated with combined transactions/individual transactions.

Confidence = combine transactions/individual transactions

i.e confidence-> 1000/500=20 percent
LIFT: Lift is calculated for knowing the ratio for the sales.

Lift-> 20/10=2
When the Lift value is below 1 means the combination is not so frequently bought by consumers. But in this
case, it shows that the probability of buying both the things together is high when compared to the transaction
for the individual items sold.
With this, we come to an overall view of the Market Basket Analysis in Data Mining and how to calculate the
sales for combination products.

Types of Market Basket Analysis

There are three types of Market Basket Analysis. They are as follow:
1. Descriptive market basket analysis: This sort of analysis looks for patterns and connections in
the data that exist between the components of a market basket. This kind of study is mostly used to
understand consumer behavior, including what products are purchased in combination and what
the most typical item combinations. Retailers can place products in their stores more profitably by
understanding which products are frequently bought together with the aid of descriptive market
basket analysis.
2. Predictive Market Basket Analysis: Market basket analysis that predicts future purchases based
on past purchasing patterns is known as predictive market basket analysis. Large volumes of data
UNIT 3 DATA MINING -FREQUENT PATTERN ANALYSIS
are analyzed using machine learning algorithms in this sort of analysis in order to create predictions
about which products are most likely to be bought together in the future. Retailers may make data-
driven decisions about which products to carry, how to price them, and how to optimize shop
layouts with the use of predictive market basket research.
3. Differential Market Basket Analysis: Differential market basket analysis analyses two sets of
market basket data to identify variations between them. Comparing the behavior of various client
segments or the behavior of customers over time is a common usage for this kind of study.
Retailers can respond to shifting consumer behavior by modifying their marketing and sales tactics
with the help of differential market basket analysis.

Benefits of Market Basket Analysis

1. Enhanced Customer Understanding: Market basket research offers insights into customer
behavior, including what products they buy together and which products they buy the most
frequently. Retailers can use this information to better understand their customers and make
informed decisions.
2. Improved Inventory Management: By examining market basket data, retailers can determine
which products are sluggish sellers and which ones are commonly bought together. Retailers can
use this information to make well-informed choices about what products to stock and how to
manage their inventory most effectively.
3. Better Pricing Strategies: A better understanding of the connection between product prices and
consumer behavior might help merchants develop better pricing strategies. Using this knowledge,
pricing plans that boost sales and profitability can be created.
4. Sales Growth: Market basket analysis can assist businesses in determining which products are
most frequently bought together and where they should be positioned in the store to grow sales.
Retailers may boost revenue and enhance customer shopping experiences by improving store
layouts and product positioning.

Applications of Market Basket Analysis

1. Retail: Market basket research is frequently used in the retail sector to examine consumer buying
patterns and inform decisions about product placement, inventory management, and pricing
tactics. Retailers can utilize market basket research to identify which items are sluggish sellers and
which ones are commonly bought together, and then modify their inventory management strategy
accordingly.
2. E-commerce: Market basket analysis can help online merchants better understand the customer
buying habits and make data-driven decisions about product recommendations and targeted
advertising campaigns. The behaviour of visitors to a website can be examined using market basket
analysis to pinpoint problem areas.
3. Finance: Market basket analysis can be used to evaluate investor behaviour and forecast the types
of investment items that investors will likely buy in the future. The performance of investment
portfolios can be enhanced by using this information to create tailored investment strategies.
4. Telecommunications: To evaluate consumer behaviour and make data-driven decisions about
which goods and services to provide, the telecommunications business might employ market
basket analysis. The usage of this data can enhance client happiness and the shopping experience.
5. Manufacturing: To evaluate consumer behaviour and make data-driven decisions about which
products to produce and which materials to employ in the production process, the manufacturing
UNIT 3 DATA MINING -FREQUENT PATTERN ANALYSIS
sector might use market basket analysis. Utilizing this knowledge will increase effectiveness and
cut costs.
Multilevel Association Rule :
Association rules created from mining information at different degrees of reflection are called various level or
staggered association rules.
Multilevel association rules can be mined effectively utilizing idea progressions under a help certainty system.
Rules at a high idea level may add to good judgment while rules at a low idea level may not be valuable
consistently.
Utilizing uniform least help for all levels :
• At the point when a uniform least help edge is utilized, the pursuit system is rearranged.
• The technique is likewise straightforward, in that clients are needed to indicate just a single least
help edge.
• A similar least help edge is utilized when mining at each degree of deliberation. (for example for
mining from “PC” down to “PC”). Both “PC” and “PC” discovered to be incessant, while “PC” isn’t.
Needs of Multidimensional Rule :
• Sometimes at the low data level, data does not show any significant pattern but there is useful
information hiding behind it.
• The aim is to find the hidden information in or between levels of abstraction.
Approaches to multilevel association rule mining :
1. Uniform Support(Using uniform minimum support for all level)
2. Reduced Support (Using reduced minimum support at lower levels)
3. Group-based Support(Using item or group based support)
Let’s discuss one by one.
1. Uniform Support –
At the point when a uniform least help edge is used, the search methodology is simplified. The
technique is likewise basic in that clients are needed to determine just a single least help
threshold. An advancement technique can be adopted, based on the information that a progenitor
is a superset of its descendant. the search keeps away from analyzing item sets containing
anything that doesn’t have minimum support. The uniform support approach however has some
difficulties. It is unlikely that items at lower levels of abstraction will occur as frequently as those at
higher levels of abstraction. If the minimum support threshold is set too high it could miss several
meaningful associations occurring at low abstraction levels. This provides the motivation for the
following approach.
2. Reduce Support –
For mining various level relationship with diminished support, there are various elective hunt
techniques as follows.
• Level-by-Level independence –
This is a full-broadness search, where no foundation information on regular item sets is
utilized for pruning. Each hub is examined, regardless of whether its parent hub is
discovered to be incessant.
• Level – cross-separating by single thing –
A thing at the I level is inspected if and just if its parent hub at the (I-1) level is regular .all
in all, we research a more explicit relationship from a more broad one. If a hub is
frequent, its kids will be examined; otherwise, its descendant is pruned from the inquiry.
• Level-cross separating by – K-itemset –
A-itemset at the I level is inspected if and just if it’s For mining various level relationship
with diminished support, there are various elective hunt techniques.
UNIT 3 DATA MINING -FREQUENT PATTERN ANALYSIS
• Level-by-Level independence –
This is a full-broadness search, where no foundation information on regular item sets is
utilized for pruning. Each hub is examined, regardless of whether its parent hub is
discovered to be incessant.
• Level – cross-separating by single thing –
A thing at the 1st level is inspected if and just if its parent hub at the (I-1) the level is
regular .all in all, we research a more explicit relationship from a more broad one. If a
hub is frequent, its kids will be examined otherwise, its descendant is pruned from the
inquiry.
• Level-cross separating by – K-item set –
A-item set at the I level is inspected if and just if its corresponding parents A item set (i -
1) level is frequent.
3. Group-based support –
The group-wise threshold value for support and confidence is input by the user or expert. The group
is selected based on a product price or item set because often expert has insight as to which
groups are more important than others.
Example –
For e.g. Experts are interested in purchase patterns of laptops or clothes in the non and electronic
category. Therefore low support threshold is set for this group to give attention to these items’
purchase patterns.
Classification Using Frequent Patterns in Data Mining
A data mining approach called frequent pattern mining is used to find recurring patterns in a dataset. It is a
kind of unsupervised machine-learning technique that looks for and identifies patterns in data using
algorithms. This method can be applied to find products that are frequently purchased together or to find
products that are more likely to be purchased by particular demographic groups. Numerous applications of
this method include client segmentation, fraud detection, and marketing analysis. Frequent pattern mining can
be utilized in classification tasks to identify the patterns that are most likely related to a particular class.
Frequent patterns refer to item sets, subsequences, or substructures that appear frequently in a data set.
It works by scanning a data collection for common patterns, or item sets, and then utilizing those patterns to
categorize previously undiscovered data items. Once learnt, the patterns may be utilized to categorize
previously unknown data items, such as new consumer purchases or new customer behaviours. This
categorization may be used for a number of purposes, including forecasting customer turnover and detecting
fraudulent activity.
UNIT 3 DATA MINING -FREQUENT PATTERN ANALYSIS

Examples of classification in real life:

1. Assume you have a dataset that contains information about consumer transactions at a shop. You might
detect patterns of things that are frequently purchased together using frequent pattern mining. These patterns
might then be used to train a classifier to anticipate which things a consumer is likely to buy based on recent
purchases.
Explanation:
Frequent pattern mining can be used to find patterns of products that are commonly bought together in a
consumer transaction dataset. For instance, frequent pattern mining can be used to find pairs of things (like
“milk and bread”) or bigger groups of items (like “milk, bread, and eggs”) that are frequently bought together if
the dataset has information on which items were purchased in each transaction. Once these patterns have
been discovered, a classifier can be trained using them. A classifier is a machine learning model that has been
trained to predict a specific outcome, in this case, the products that a customer is most likely to purchase
based on previous purchases. The classifier may learn which things are likely to be bought together by being
trained on the patterns discovered through frequent pattern mining.
2. A hospital is thinking about employing a new nurse. They are searching for someone with extensive patient
care expertise.
UNIT 3 DATA MINING -FREQUENT PATTERN ANALYSIS
Explanation:
The hospital would use the information on the new nurse’s credentials, experience, and prior performance to
train a model that determines whether or not they have “vast patient care knowledge.” To train the model, the
hospital would use a labelled dataset of previously employed nurses. The classifier’s predictions would be
relied upon to determine whether or not to hire the new nurse. To choose the finest applicant, additional
criteria including background checks, reference checks, and psychological assessments may also be taken
into consideration. The hospital would next determine whether or not to hire the new nurse based on the
classifier’s predictions. The hospital would probably be more likely to hire the new nurse if the classifier
forecasts that she has “significant patient care knowledge.”
3. A corporation is in the market for a new accountant. They seek someone who can grasp financial reports
fast and properly.
Explanation:
The company would use the information on the new accountant’s training, experience, and historical
performance to train a model that determines whether or not they can “understand financial reports quickly
and correctly.” The business would train the model using a labelled dataset of previous accountants. Whether
or not to hire the new accountant would be determined by the classifier’s predictions. To choose the finest
applicant, additional criteria including background checks, reference checks, and psychological assessments
may also be taken into consideration.
4. A supermarket is seeking a new cashier. They want someone who can handle a tremendous workload while
remaining cool under pressure.
Explanation:
The store would use the information on the new cashier’s credentials, experience, and prior performance to
develop a model in classification data mining that determines whether or not they can “manage a massive
workload while keeping cool under pressure.” The store would train the model using a labelled dataset of
previous cashiers. Whether or not to hire the new cashier would be determined by the classifier’s predictions.
To choose the finest applicant, additional criteria including background checks, reference checks, and
psychological assessments may also be taken into consideration.
5. A corporation is in the market for a new marketing director. They want someone who can design efficient
marketing initiatives.
Explanation:
The company would use the information on the incoming marketing director’s credentials, expertise, and
historical performance to train a model that determines whether or not they can “create efficient marketing
efforts.” The business would train the model using a labelled dataset of previous marketing directors. The
classifier’s predictions would be used to determine whether or not to hire the new marketing director. To
choose the finest applicant, additional criteria including background checks, reference checks, and
psychological assessments may also be taken into consideration.
There are a number of algorithms that can be used for classification using frequent pattern mining. Some
examples include:
UNIT 3 DATA MINING -FREQUENT PATTERN ANALYSIS

1. Apriori Algorithm:
The Apriori algorithm is an algorithm for finding frequent item sets in a given dataset. It is an unsupervised
learning technique that employs a “bottom-up” strategy to discover frequent itemsets in a dataset by first
recognizing individual items in the dataset and then looking for combinations of items that appear often
together. The Apriori technique may be used to identify the rules that govern the relationships between various
objects in a collection. It is frequently used in market basket analysis, which seeks to find goods that are
frequently purchased together.
2. FP-Growth Algorithm:
The FP-Growth (Common Pattern Growth) algorithm is a data mining technique that finds frequent patterns or
itemsets in a dataset. It operates by building an FP-Tree, which is a compact representation of the dataset.
The FP-Tree is then utilized to construct common patterns from the ground up. The FP-Growth technique is
very scalable and can effectively detect common patterns in huge datasets. It is also more efficient than
another common approach for mining frequent item sets, the Apriori algorithm.
3. Closed Frequent Itemset Mining:
Closed frequent itemset mining is a kind of frequent itemset mining in which all itemsets in a given dataset
with a frequency that meets or exceeds a predetermined threshold are discovered. The technique works by
first generating a list of all frequent item sets in the dataset, then iteratively evaluating each item set to
determine whether any supersets of the itemset have a frequency that meets or exceeds the stated threshold.
Any supersets that meet the criteria are added to the list of frequently occurring itemsets. This procedure is
continued until no further supersets are discovered.
4. Naive Bayesian Algorithm:
Naive Bayes is a form of supervised machine learning method that is used for classification and is based on
Bayes’ Theorem. It is a probabilistic method that predicts using the probability of each attribute belonging to
each class. The Naive Bayes method is based on the assumption that all qualities are independent of one
another. This streamlines the computation of probabilities, allowing the algorithm to easily forecast a class of
a new data point.

5 DM Association
No ratings yet
5 DM Association
27 pages
Data Mining UNIT 3 LECTURE NOTES
No ratings yet
Data Mining UNIT 3 LECTURE NOTES
13 pages
Frequent Pattern Mining Concepts
No ratings yet
Frequent Pattern Mining Concepts
56 pages
Data Mining Frequent Patterns
No ratings yet
Data Mining Frequent Patterns
22 pages
CS 412 Intro. To Data Mining
No ratings yet
CS 412 Intro. To Data Mining
55 pages
Mining Frequent Patterns Unit-3
No ratings yet
Mining Frequent Patterns Unit-3
13 pages
Note 1455181909
No ratings yet
Note 1455181909
30 pages
2 Unit DM K Raj Kuamr
No ratings yet
2 Unit DM K Raj Kuamr
26 pages
Unit-03 DW&DM Notes Ashish Singh PDF 11
No ratings yet
Unit-03 DW&DM Notes Ashish Singh PDF 11
8 pages
CH-4 Mining Association Rules
No ratings yet
CH-4 Mining Association Rules
35 pages
Mining Frequent Itemsets
No ratings yet
Mining Frequent Itemsets
4 pages
Unit - 3 Mining Frequent Patterns
No ratings yet
Unit - 3 Mining Frequent Patterns
10 pages
P8 FPBasic
No ratings yet
P8 FPBasic
53 pages
Mining Frequent Patterns, Association and Correlations - Basic Concepts and Methods
No ratings yet
Mining Frequent Patterns, Association and Correlations - Basic Concepts and Methods
55 pages
Association Rules
No ratings yet
Association Rules
48 pages
Association Rule Mining
No ratings yet
Association Rule Mining
10 pages
Fundamentals of Data Science Unit 5
No ratings yet
Fundamentals of Data Science Unit 5
25 pages
DWDM Unit III Notes
No ratings yet
DWDM Unit III Notes
23 pages
DM Unit - 2
No ratings yet
DM Unit - 2
14 pages
Week 3
No ratings yet
Week 3
56 pages
Unit 2 Material
No ratings yet
Unit 2 Material
17 pages
DM-BS-lec6-Mining Frequent Patterns
No ratings yet
DM-BS-lec6-Mining Frequent Patterns
37 pages
FDS Unit02
No ratings yet
FDS Unit02
16 pages
Chap4 PatternMiningBasic
No ratings yet
Chap4 PatternMiningBasic
52 pages
Association Rules
No ratings yet
Association Rules
20 pages
06 Association Rule Mining
No ratings yet
06 Association Rule Mining
20 pages
Pattern Mining Concepts and Methods
No ratings yet
Pattern Mining Concepts and Methods
52 pages
Data Mining: Frequent Patterns
No ratings yet
Data Mining: Frequent Patterns
40 pages
Association Rule Mining Guide
No ratings yet
Association Rule Mining Guide
16 pages
BCA Semester VI Data Mining Module 3 (Presentation Kind of N
No ratings yet
BCA Semester VI Data Mining Module 3 (Presentation Kind of N
108 pages
06apriori Edited v3
No ratings yet
06apriori Edited v3
29 pages
Association Rule Discovery Techniques
100% (1)
Association Rule Discovery Techniques
21 pages
Unit 3 Data Science
No ratings yet
Unit 3 Data Science
15 pages
Updated Module 3
No ratings yet
Updated Module 3
31 pages
Frequent Pattern Based Clustering Methods
No ratings yet
Frequent Pattern Based Clustering Methods
23 pages
Mining Frequent Patterns
No ratings yet
Mining Frequent Patterns
108 pages
Data Mining: Frequent Itemsets & Clustering
No ratings yet
Data Mining: Frequent Itemsets & Clustering
152 pages
Mining Frequent Patterns, Associations and Correlations: Basic Concepts and Methods
No ratings yet
Mining Frequent Patterns, Associations and Correlations: Basic Concepts and Methods
20 pages
Data Mining - : Dr. Mahmoud Mounir Mahmoud - Mounir@cis - Asu.edu - Eg
No ratings yet
Data Mining - : Dr. Mahmoud Mounir Mahmoud - Mounir@cis - Asu.edu - Eg
26 pages
Unit 3
No ratings yet
Unit 3
62 pages
Frequent Pattern Mining
No ratings yet
Frequent Pattern Mining
2 pages
Unit II
No ratings yet
Unit II
22 pages
Fptreehuffman
No ratings yet
Fptreehuffman
4 pages
Mining Frequent Patterns and Associations
No ratings yet
Mining Frequent Patterns and Associations
52 pages
06 Apriori
No ratings yet
06 Apriori
36 pages
Unit 5 DWDM - 2
No ratings yet
Unit 5 DWDM - 2
50 pages
Unit 3
No ratings yet
Unit 3
44 pages
04 FPbasic
No ratings yet
04 FPbasic
78 pages
Frequent Pattern Mining Techniques
No ratings yet
Frequent Pattern Mining Techniques
37 pages
DMDW U3
No ratings yet
DMDW U3
16 pages
Powerpoint Presentation On Somlething
No ratings yet
Powerpoint Presentation On Somlething
181 pages
Chapter 5 Data Mining: Dr. Huma Lone
No ratings yet
Chapter 5 Data Mining: Dr. Huma Lone
56 pages
Chapter 5
No ratings yet
Chapter 5
24 pages
CSE 385 - Data Mining and Business Intelligence - Lecture 02
No ratings yet
CSE 385 - Data Mining and Business Intelligence - Lecture 02
67 pages
DM Lect7
No ratings yet
DM Lect7
26 pages
Computer Science
No ratings yet
Computer Science
59 pages
DWDM - Unit - IV
No ratings yet
DWDM - Unit - IV
67 pages
Raheem
No ratings yet
Raheem
40 pages
SK 17
No ratings yet
SK 17
8 pages
Ooad 1,2
No ratings yet
Ooad 1,2
57 pages
Mobile Internet Protocol
No ratings yet
Mobile Internet Protocol
23 pages
Arrays Unit 3
No ratings yet
Arrays Unit 3
33 pages
Juspay
No ratings yet
Juspay
2 pages
SDT For Control Statements
No ratings yet
SDT For Control Statements
2 pages
MCS 231
No ratings yet
MCS 231
197 pages
ED IV-I Mid II QB (ME, EEE, CSE, CSD, CAI, IT, ECE)
No ratings yet
ED IV-I Mid II QB (ME, EEE, CSE, CSD, CAI, IT, ECE)
2 pages
MC Unit 2
No ratings yet
MC Unit 2
32 pages
MC Unit - I
No ratings yet
MC Unit - I
30 pages
What Is Data Structure Unit 1
No ratings yet
What Is Data Structure Unit 1
16 pages
Kits Communication Test Results
No ratings yet
Kits Communication Test Results
2 pages
Understanding Language Processors
No ratings yet
Understanding Language Processors
17 pages
Queue Introduction
No ratings yet
Queue Introduction
7 pages
Understanding Robotics Fundamentals
No ratings yet
Understanding Robotics Fundamentals
9 pages
What Is Syntax Directed Translation
No ratings yet
What Is Syntax Directed Translation
8 pages
Unit 2 - Final - Removed
No ratings yet
Unit 2 - Final - Removed
9 pages
Key Characteristics of IoT Explained
No ratings yet
Key Characteristics of IoT Explained
3 pages
Iot 1
No ratings yet
Iot 1
21 pages
Sampling Techniques in Psychology
No ratings yet
Sampling Techniques in Psychology
8 pages
Zuur 2010
No ratings yet
Zuur 2010
12 pages
Lynch and Slovin'S Formula
No ratings yet
Lynch and Slovin'S Formula
13 pages
Method Validation Training Overview
No ratings yet
Method Validation Training Overview
3 pages
Bending
No ratings yet
Bending
3 pages
Cut Off JKR - Kosong
No ratings yet
Cut Off JKR - Kosong
17 pages
How To Choose A Sampling Technique and Determine Sample Size For Research - A Simplified Guide For Resea
No ratings yet
How To Choose A Sampling Technique and Determine Sample Size For Research - A Simplified Guide For Resea
7 pages
Dadm Research
No ratings yet
Dadm Research
11 pages
Malhotra Mr05 PPT 19
100% (9)
Malhotra Mr05 PPT 19
40 pages
ANOVA vs ANCOVA vs MANOVA vs MANCOVA
No ratings yet
ANOVA vs ANCOVA vs MANOVA vs MANCOVA
1 page
Best Graphs for Titer Level Analysis
No ratings yet
Best Graphs for Titer Level Analysis
5 pages
SPSS Guide: Cronbach's Alpha Analysis
100% (1)
SPSS Guide: Cronbach's Alpha Analysis
6 pages
Kalabngg
No ratings yet
Kalabngg
28 pages
(Ebook) Statistics for managers using Microsoft Excel by Levine, David M;Stephan, David F;Szabat, Kathryn A ISBN 9780134173054, 9780134566672, 9781292156347, 0134173058, 013456667X, 1292156341 instant download
No ratings yet
(Ebook) Statistics for managers using Microsoft Excel by Levine, David M;Stephan, David F;Szabat, Kathryn A ISBN 9780134173054, 9780134566672, 9781292156347, 0134173058, 013456667X, 1292156341 instant download
306 pages
Slidesc53 3 2
100% (1)
Slidesc53 3 2
29 pages
Alfian Chaerul Kemal
No ratings yet
Alfian Chaerul Kemal
3 pages
Two-Sample Inference Techniques Explained
No ratings yet
Two-Sample Inference Techniques Explained
33 pages
Electricity Demand Forecasting Using A Novel Time Series Ensemble Technique
No ratings yet
Electricity Demand Forecasting Using A Novel Time Series Ensemble Technique
13 pages
Regression Stepwise (PIZZA)
No ratings yet
Regression Stepwise (PIZZA)
4 pages
Data Analytics Using R
No ratings yet
Data Analytics Using R
23 pages
LAV Regression: Recent Advances
No ratings yet
LAV Regression: Recent Advances
25 pages
In-House Validation of Analytical Methods: Procedures & Calculation Sheets
No ratings yet
In-House Validation of Analytical Methods: Procedures & Calculation Sheets
21 pages
1 s2.0 S1470160X16300279 Main
No ratings yet
1 s2.0 S1470160X16300279 Main
6 pages
Sleep and Physical Activity in University Students
No ratings yet
Sleep and Physical Activity in University Students
19 pages
Statistical Analysis for Coaches
No ratings yet
Statistical Analysis for Coaches
6 pages
Module 3 Regression Notes
100% (1)
Module 3 Regression Notes
3 pages
Hypothesis Testing - A Visual Introduction To Statistical Significance
100% (4)
Hypothesis Testing - A Visual Introduction To Statistical Significance
137 pages
Product Limit Estimator
No ratings yet
Product Limit Estimator
18 pages
Earnings Management (EM) Calculation Using the MJM
No ratings yet
Earnings Management (EM) Calculation Using the MJM
3 pages
BUDGETED-LESSON-PLAN-2nd SEMESTER
No ratings yet
BUDGETED-LESSON-PLAN-2nd SEMESTER
14 pages