0% found this document useful (0 votes)
45 views

Data Mining: Priyanka Nemalikanti

KDD is the process of discovering meaningful patterns and knowledge from large amounts of data. It involves cleaning, transforming, and modeling the data to extract useful insights. Data mining is a key part of KDD as it uses algorithms to identify patterns. Traditional data analysis methods struggle with modern data challenges like high dimensionality. Descriptive tasks analyze data characteristics while predictive tasks induce patterns to make predictions about future data. Both are important approaches in data mining.
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views

Data Mining: Priyanka Nemalikanti

KDD is the process of discovering meaningful patterns and knowledge from large amounts of data. It involves cleaning, transforming, and modeling the data to extract useful insights. Data mining is a key part of KDD as it uses algorithms to identify patterns. Traditional data analysis methods struggle with modern data challenges like high dimensionality. Descriptive tasks analyze data characteristics while predictive tasks induce patterns to make predictions about future data. Both are important approaches in data mining.
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

Data Mining

Priyanka Nemalikanti
Knowledge Discovery in Databases (KDD)

KDD is Knowledge Discovery in Databases, and it's the method used to find, transform,

and refine data that is meaningful data and patterns from a raw database so that it can be utilised

in various applications or domains. KDD is a lengthy and complicated process that entails many

steps and iterations. KDD in data mining is an analytical and programmed approach used to

model data that is retrieved from a database to extract valuable and applicable knowledge. Data

mining is the main backbone of KDD, and thus it is crucial in the whole process.

KDD utilises different algorithms that are mostly self-learning to help deduce important

patterns from data that has been processed. There are various steps involved in a KDD process.

These include setting a goal and understanding the application, selecting data and integrating it,

data cleaning and preprocessing, the transformation of data, data mining, pattern evaluation and

interpretation and knowledge discovery and use (Tan, et al., 2016).

Motivating challenges

Traditional data analysis methods have encountered practical difficulties while meeting

the numerous challenges posed by the new data sets. There are multiple challenges such as

scalability, high dimensionality, heterogeneous and complex data, data ownership and

distribution, and non-traditional analysis. High dimensionality is one of the specific challenges

that has motivated the development of data mining. It is normal to encounter data sets with many

attributes instead of just a few common attributes a few decades ago. Progress in microarray

technology has helped produce gene expression data involving more than a thousand features.

Data sets that have spatial or temporal component usually has high dimensionality. For example,

if a person can consider data that has measurements of temperature located in different places. If
the temperature measurements are mostly taken repeatedly for an extended period, then the

number of features also increases in proportion to the number of measurements taken.

Traditional data analysis methods made from low dimensional data mostly do not work well for

high dimensional data. For some data analysis algorithms its important to note that the

computational complexity increases rapidly when the dimensionality is increasing (Tan, et al.,

2016).

Note how data mining integrates with the components of statistics and AI, ML, and Pattern

Recognition.

Data mining, statistics, AI, and machine learning are all interesting data-driven

disciplines that help a company succeed in making the best decision and positively affect the

organisation's growth. These disciplines are considered to be the same with just a few minor

differences. Hence, they can be referred to as identical twins, which use different terminologies

and words and follow different notations. Data mining is used to find out hidden patterns stored

in large data warehouses and does this by using the power of statistics, artificial intelligence,

machine learning and pattern recognition. Data mining mainly uses the power of machine

learning, statistics and database technique to succeed in mining large databases and hence

coming up with patterns

Difference between predictive and descriptive tasks and the importance of each.

Descriptive tasks describe the data characteristics in a target data set. In contrast, the

predictive tasks mostly carry out the induction over the past and current data to make predictions.

The descriptive technique is more accurate and precise when it is compared with predictive

mining tasks. The predictive analysis entails control over situations and responding to them,
while descriptive analysis only responds to the situation. It’s important to note that descriptive

mining tasks employ unsupervised learning functions while the predictive task uses a supervised

learning technique (Tan, et al., 2016). Predictive tasks are important as it helps predict future

results instead of the current behaviour, while descriptive tasks help determine the data

regularities and reveal patterns.


References

Tan, P. N., Steinbach, M., & Kumar, V. (2016). Introduction to data mining. Pearson Education

India.

You might also like