Data Mining: Priyanka Nemalikanti
Data Mining: Priyanka Nemalikanti
Priyanka Nemalikanti
Knowledge Discovery in Databases (KDD)
KDD is Knowledge Discovery in Databases, and it's the method used to find, transform,
and refine data that is meaningful data and patterns from a raw database so that it can be utilised
in various applications or domains. KDD is a lengthy and complicated process that entails many
steps and iterations. KDD in data mining is an analytical and programmed approach used to
model data that is retrieved from a database to extract valuable and applicable knowledge. Data
mining is the main backbone of KDD, and thus it is crucial in the whole process.
KDD utilises different algorithms that are mostly self-learning to help deduce important
patterns from data that has been processed. There are various steps involved in a KDD process.
These include setting a goal and understanding the application, selecting data and integrating it,
data cleaning and preprocessing, the transformation of data, data mining, pattern evaluation and
Motivating challenges
Traditional data analysis methods have encountered practical difficulties while meeting
the numerous challenges posed by the new data sets. There are multiple challenges such as
scalability, high dimensionality, heterogeneous and complex data, data ownership and
distribution, and non-traditional analysis. High dimensionality is one of the specific challenges
that has motivated the development of data mining. It is normal to encounter data sets with many
attributes instead of just a few common attributes a few decades ago. Progress in microarray
technology has helped produce gene expression data involving more than a thousand features.
Data sets that have spatial or temporal component usually has high dimensionality. For example,
if a person can consider data that has measurements of temperature located in different places. If
the temperature measurements are mostly taken repeatedly for an extended period, then the
Traditional data analysis methods made from low dimensional data mostly do not work well for
high dimensional data. For some data analysis algorithms its important to note that the
computational complexity increases rapidly when the dimensionality is increasing (Tan, et al.,
2016).
Note how data mining integrates with the components of statistics and AI, ML, and Pattern
Recognition.
Data mining, statistics, AI, and machine learning are all interesting data-driven
disciplines that help a company succeed in making the best decision and positively affect the
organisation's growth. These disciplines are considered to be the same with just a few minor
differences. Hence, they can be referred to as identical twins, which use different terminologies
and words and follow different notations. Data mining is used to find out hidden patterns stored
in large data warehouses and does this by using the power of statistics, artificial intelligence,
machine learning and pattern recognition. Data mining mainly uses the power of machine
learning, statistics and database technique to succeed in mining large databases and hence
Difference between predictive and descriptive tasks and the importance of each.
Descriptive tasks describe the data characteristics in a target data set. In contrast, the
predictive tasks mostly carry out the induction over the past and current data to make predictions.
The descriptive technique is more accurate and precise when it is compared with predictive
mining tasks. The predictive analysis entails control over situations and responding to them,
while descriptive analysis only responds to the situation. It’s important to note that descriptive
mining tasks employ unsupervised learning functions while the predictive task uses a supervised
learning technique (Tan, et al., 2016). Predictive tasks are important as it helps predict future
results instead of the current behaviour, while descriptive tasks help determine the data
Tan, P. N., Steinbach, M., & Kumar, V. (2016). Introduction to data mining. Pearson Education
India.