0% found this document useful (0 votes)
59 views9 pages

Neural Networks in Data Mining Applications

Call for bca

Uploaded by

rolexxx3636
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
59 views9 pages

Neural Networks in Data Mining Applications

Call for bca

Uploaded by

rolexxx3636
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Neural Network:

Neural Network is an information processing


paradigm that is inspired by the human nervous
system. As in the Human Nervous system, we
have Biological neurons in the same way
in Neural networks we have Artificial Neurons
which is a Mathematical Function that originates
from biological neurons. The human brain is
estimated to have around 10 billion neurons each
connected on average to 10,000 other neurons.
Each neuron receives signals through synapses
that control the effects of the signal on the
neuron.

How Neural Networks Can Be Used For Data Mining?


As all of us are aware that how technology is growing day-by-day and a Large amount of data
is produced every second, analyzing data is going to be very important because it helps us in
fraud detection, identifying spam e-mail, etc. So Data Mining comes into existence to help us
find hidden patterns, discover knowledge from large datasets.
In this article, we basically look at Neural networks and what is the application of neural
networks for Data Mining work.

Neural Network Architecture:


1. While there are numerous different neural network
architectures that have been created by researchers,
the most successful applications in data mining
neural networks have been multilayer feedforward
networks. These are networks in which there is an
input layer consisting of nodes that simply accept the
input values and successive layers of nodes that are
neurons as depicted in the above figure of Artificial
Neuron. The outputs of neurons in a layer are inputs
to neurons in the next layer. The last layer is called
the output layer. Layers between the input and output
layers are known as hidden layers.
2. As you know that we have two types of Supervised learning one is Regression and another
one is classification. So in the Regression type problem neural network is used to predict a
numerical quantity there is one neuron in the output layer and its output is the prediction.

Why use Neural Network Method in Data Mining?


Neural networks help in mining large amounts of data in various sectors such as retail,
banking (Fraud detection), bioinformatics(genome sequencing), etc. Finding useful
information for large data which is hidden is very challenging and very necessary also. Data
Mining uses Neural networks to harvest information from large datasets from data
warehousing organizations. Which helps the user in decision making.
Some of the Applications of Neural Network In Data Mining
 Fraud Detection: As we know that fraudsters have been exploiting businesses, banks for
their own financial gain for many past years, and the problem is going to increase in
today’s modern world because of the advancement of technology, which makes fraud
relatively easy to commit but on the other hand technology also helps is fraud detection
and in this neural network help us a lot in detecting fraud.
 Healthcare: In healthcare, Neural Network helps us in Diagnosing diseases, as we know
that there are many diseases and there are large datasets having records of these diseases.
With neural networks and these records, we diagnosed these diseases in the early stage
as soon as possible.

Different Neural Network Method in Data Mining


Neural Network Method is used For Classification, Clustering, Feature mining, prediction,
and pattern recognition. McCulloch-Pitts model is considered to be the first neural network
and the Hebbian learning rule is one of the earliest and simplest learning rules for the neural
network. the following three types:

 Feed-Forward Neural Networks: In Feed-Forward Network, if the output values cannot


be traced back to the input values and if for every input node, an output node is calculated,
then there is a forward flow of information and no feedback between the layers. In simple
words, the information moves in only one direction (forward) from the input nodes,
through the hidden nodes (if any), and to the output nodes. Such a type of network is
known as a feedforward network.
 Feedback Neural Network: Signals can travel in both directions in a feedback network.
Feedback neural networks are very powerful and can become very complex. feedback
networks are dynamic. The “states” in such a network are constantly changing until an
equilibrium point is reached. They stay at equilibrium until the input changes and a new
equilibrium needs to be found. Feedback neural network architectures are also known as
interactive or recurrent. Feedback loops are allowed in such networks. They are used for
content addressable memory.
 Self Organization Neural Network: Self Organizing Neural Network (SONN) is a type
of artificial neural network but is trained using competitive learning rather than error-
correction learning (e.g., backpropagation with gradient descent) used by other artificial
neural networks. A Self Organizing Neural Network (SONN) is an unsupervised learning
model in Artificial Neural Network termed as Self-Organizing Feature Maps or Kohonen
Maps. It is used to produce a low-dimensional (typically two-dimensional) representation
of a higher-dimensional data set while preserving the topological structure of the data.
parallel and distributed algorithm in data mining

Parallel and distributed algorithms are commonly used in data mining to speed up the processing
of large amounts of data. In data mining, parallel and distributed algorithms can be used for tasks
such as classification, clustering, and association rule mining.

Parallel algorithms in data mining can be used on a single computer with multiple processors or
cores. They work by breaking up the data into smaller chunks that can be processed simultaneously
by different processors. For example, if a classification task involves analyzing a large number of
images, each image can be processed by a different processor at the same time. The results from
each processor are combined at the end to produce the final output.

Distributed algorithms in data mining, on the other hand, are designed to run on multiple
computers connected by a network. In a distributed algorithm, the data is divided into smaller
chunks that are processed independently by different computers. The results from each computer
are then combined to produce the final output. This approach is useful for very large data sets that
cannot be processed on a single computer.

In both parallel and distributed algorithms, the data must be divided into smaller pieces that can
be processed independently. This can be challenging in data mining because the data may be
structured or unstructured, and different data mining tasks may require different approaches to data
partitioning. For example, in clustering, the data may be partitioned based on similarity between
data points, while in classification, the data may be partitioned based on features or attributes.
Decision Tree Induction in Data Mining
 Decision tree induction is a common technique in data mining that is used to generate a
predictive model from a dataset. This technique involves constructing a tree-like
structure, where each internal node represents a test on an attribute, each branch
represents the outcome of the test, and each leaf node represents a prediction. The goal
of decision tree induction is to build a model that can accurately predict the outcome of
a given event, based on the values of the attributes in the dataset.
 To build a decision tree, the algorithm first selects the attribute that best splits the data
into distinct classes. This is typically done using a measure of impurity, such as entropy
or the Gini index, which measures the degree of disorder in the data. The algorithm then
repeats this process for each branch of the tree, splitting the data into smaller and smaller
subsets until all of the data is classified.
 Decision tree induction is a popular technique in data mining because it is easy to
understand and interpret, and it can handle both numerical and categorical data.
Additionally, decision trees can handle large amounts of data, and they can be updated
with new data as it becomes available. However, decision trees can be prone to
overfitting, where the model becomes too complex and does not generalize well to new
data. As a result, data scientists often use techniques such as pruning to simplify the tree
and improve its performance.

Advantages of Decision Tree Induction


1. Easy to understand and interpret: Decision trees are a visual and intuitive model that can
be easily understood by both experts and non-experts.
2. Handle both numerical and categorical data: Decision trees can handle a mix of numerical
and categorical data, which makes them suitable for many different types of datasets.
3. Can handle large amounts of data: Decision trees can handle large amounts of data and
can be updated with new data as it becomes available.
4. Can be used for both classification and regression tasks: Decision trees can be used for
both classification, where the goal is to predict a discrete outcome, and regression, where
the goal is to predict a continuous outcome.

Disadvantages of Decision Tree


Induction

1. Prone to overfitting: Decision trees


can become too complex and may
not generalize well to new data. This
can lead to poor performance on
unseen data.
2. Sensitive to small changes in the
data: Decision trees can be sensitive
to small changes in the data, and a
small change in the data can result in
a significantly different tree.
3. Biased towards attributes with
many levels: Decision trees can be biased towards attributes with many levels, and may
not perform well on attributes with a small number of levels.
DATA MINING VERSUS KNOWLEDGE DISCOVERY IN DATABASES
DEFINITION
Knowledge discovery in databases (KDD) is the process of finding useful information and patterns in
[Link] mining is the use of algorithms to extract the information and patterns derived by the KDD
process.
The KDD process consists of the following five steps:
1. Selection:
@The data needed for the data mining process may be obtained from many different and heterogeneous
data sources. @This first step obtains the data from various databases, files, and nonelectronic sources.
2. Preprocessing:
@The data to be used by the process may have incorrect or missing data. @There may be anomalous
data from multiple sources involving different data types and metrics. @There may be many different
activities performed at this time. @Erroneous data may be corrected or removed, whereas missing data
must be supplied or predicted (often using data mining tools).
3. Transformation:
@Data from different sources must be converted into a common format for processing. @Some data
may be encoded or transformed into more usable formats. @Data reduction may be used to reduce the
number of possible data values being considered.
4. Data mining:
@Based on the data mining task being performed, this step applies algorithms to the transformed data
to generate the desired results.
5. Interpretation/evaluation:
@How the data mining results are presented to the users is extremely important because the usefulness
of the results is dependent on it. @Various visualization and GUI strategies are used at this last step.
Different between KDD and DATA MINING

Parameter KDD Data Mining

Definition KDD refers to a process of Data Mining refers to a process of


identifying valid, novel, potentially extracting useful and valuable information
useful, and ultimately understandable or patterns from large data sets.
patterns and relationships in data.

Objective To find useful knowledge from data. To extract useful information from data.

Techniques Used Data cleaning, data integration, data Association rules, classification, clustering,
selection, data transformation, data regression, decision trees, neural networks,
mining, pattern evaluation, and and dimensionality reduction.
knowledge representation and
visualization.

Output Structured information, such as rules Patterns, associations, or insights that can
and models, that can be used to make be used to improve decision-making or
decisions or predictions. understanding

Focus Focus is on the discovery of useful Data mining focus is on the discovery of
knowledge, rather than simply patterns orrelationships in data.
finding patterns in data.

Role of domain Domain expertise is important in Domain expertise is less critical in data
expertise KDD, as it helps in defining the goals mining, as the algorithms are designed to
of the process, choosing appropriate identify patterns without relying on prior
data, and interpreting the results. knowledge.
Explain various Clustering methods
Partitioning methods are a type of clustering algorithm that divides a dataset into non-
overlapping clusters or partitions. These methods work by assigning each data point to a cluster
based on their similarity to other data points within that cluster. The goal of partitioning
methods is to minimize the sum of squared distances between the data points and their assigned
cluster centers.

There are several popular partitioning methods, including:

K-Means Clustering: K-Means is one of the most widely used partitioning methods. It
works by randomly selecting K initial cluster centers, assigning each data point to its closest
cluster center, and then recalculating the cluster centers based on the mean of the data points
in each cluster. This process is repeated until the cluster centers no longer change.

Fuzzy C-Means Clustering: Fuzzy C-Means is a variant of K-Means that allows data
points to belong to more than one cluster. This is done by assigning each data point a probability
of belonging to each cluster based on their similarity to the other data points in the cluster.

Hierarchical Clustering: Hierarchical clustering is a partitioning method that creates a


hierarchy of clusters. It can be either agglomerative or divisive. Agglomerative hierarchical
clustering starts with each data point as its own cluster and iteratively merges the closest pairs
of clusters until a single cluster is formed. Divisive hierarchical clustering starts with all the
data points in a single cluster and iteratively splits the clusters into smaller subclusters.

Density-Based Clustering: Density-based clustering is a partitioning method that


identifies clusters based on areas of high density in the dataset. Data points that are close
together and have a high density are considered to belong to the same cluster.
various types of data in clustering
In clustering, there are several types of data that can be used to group similar objects together.
The most common types of data used in clustering are:

Nominal data: Nominal data are categorical data that do not have any inherent order or
ranking. Examples of nominal data include colors, gender, and types of animals. In clustering,
nominal data can be used to group objects based on their categorical similarities.

Ordinal data: Ordinal data are categorical data that have an inherent order or ranking.
Examples of ordinal data include education levels, income brackets, and customer satisfaction
ratings. In clustering, ordinal data can be used to group objects based on their relative position
in the ranking.

Interval data: Interval data are numerical data that have a fixed interval between
consecutive values. Examples of interval data include temperature, time, and weight. In
clustering, interval data can be used to group objects based on their numerical similarities.

Ratio data: Ratio data are numerical data that have a fixed zero point and a fixed interval
between consecutive values. Examples of ratio data include height, weight, and income. In
clustering, ratio data can be used to group objects based on their numerical similarities, with
particular attention paid to the magnitude of the values.

The choice of data type depends on the nature of the data being analyzed and the objectives of
the clustering analysis. For example, nominal data may be more appropriate for clustering
customer segments based on their preferences or behaviors, while interval or ratio data may be
more appropriate for clustering financial data based on performance metrics.
Types of Association Rules in Data Mining
Association rule learning is a machine learning technique used for discovering interesting
relationships between variables in large databases. It is designed to detect strong rules in the
database based on some interesting metrics. For any given multi-item transaction, association
rules aim to obtain rules that determine how or why certain items are linked. Association
rules are created for finding information about general if-then patterns using specific criteria
with support and trust to define what the key relationships are. They help to show the
frequency of an item in specific data since confidence is defined by the number of times an
if-then statement is found to be true.

 Types of Association Rules:


There are various types of association rules in data mining:-
 Multi-relational association rules
 Generalized association rules
 Quantitative association rules
 Interval information association rules
1. Multi-relational association rules: Multi-Relation Association Rules (MRAR) is a new
class of association rules, different from original, simple, and even multi-relational
association rules (usually extracted from multi-relational databases), each rule element
consists of one entity but many a relationship. These relationships represent indirect
relationships between entities.
[Link]:
@Association rules created from mining information at different degrees of reflection are
called various level or staggered association rules. @Multilevel association rules can be
mined effectively utilizing idea progressions under a help certainty system. @Rules at a high
idea level may add to good judgment while rules at a low idea level may not be valuable
consistently.
2. Generalized association rules: Generalized association rule extraction is a powerful tool
for getting a rough idea of interesting patterns hidden in data. However, since patterns are
extracted at each level of abstraction, the mined rule sets may be too large to be used
effectively for decision-making. Therefore, in order to discover valuable and interesting
knowledge, post-processing steps are often required. Generalized association rules should
have categorical (nominal or discrete) properties on both the left and right sides of the rule.
3. Quantitative association rules: Quantitative association rules is a special type of
association rule. Unlike general association rules, where both left and right sides of the rule
should be categorical (nominal or discrete) attributes, at least one attribute (left or right) of
quantitative association rules must contain numeric attributes

Uses of Association Rules - Some of the uses of association rules in different fields are given
below:

 Medical Diagnosis: Association rules in medical diagnosis can be used to help doctors
cure patients. As all of us know that diagnosis is not an easy thing, and there are many
errors that can lead to unreliable end results. Using the multi-relational association rule,
we can determine the probability of disease occurrence associated with various factors
and symptoms.
 Market Basket Analysis: It is one of the most popular examples and uses of association
rule mining. Big retailers typically use this technique to determine the association
between items.

You might also like