Knowledge Discovery Database(Kdd Process)
KDD (Knowledge Discovery in Databases) is a field of computer science, which includes the tools and
theories to help humans in extracting useful and previously unknown information (i.e. knowledge) from
large collections of digitized data. KDD consists of several steps, and Data Mining is one of them. Data
Mining is application of a specific algorithm in order to extract patterns from data. Nonetheless, KDD and
Data Mining are used interchangeably.
KDD has become a very important process to convert this large wealth of data in to business
intelligence, as manual extraction of patterns has become seemingly impossible in the past few
decades. For example, it is currently been used for various applications such as social network
analysis, fraud detection, science, investment, manufacturing, telecommunications, data cleaning,
sports, information retrieval and largely for marketing. KDD is usually used to answer questions like
what are the main products that might help to obtain high profit next year in Wal-Mart?. This process
has several steps. It starts with developing an understanding of the application domain and the goal
and then creating a target dataset. This is followed by cleaning, preprocessing, reduction and
projection of data
What is the difference between KDD and Data mining?
KDD is the overall process of extracting knowledge from data while Data Mining is
a step inside the KDD process, which deals with identifying patterns in data.
The process of finding and interpreting patterns from data involves the repeated
application of the following steps:
a. Data Integration
First of all the data is collected and integrated from all the different sources.
b. Data Selection
Generally, we may not all the data we have collected in the first step. Also, in this step, we select only
those data which we think useful for data mining.
c. Data Cleaning
Generally, the data we have collected is not clean and may contain errors, missing values, noisy or
inconsistent data. Therefore we need to apply different techniques to get rid of such anomalies.
d. Data Transformation
Basically, the data even after cleaning is not ready for mining. Also, we need to transform them into forms
appropriate for mining. Thus, the techniques used to do this are smoothing, aggregation, normalization etc.
e. Data Mining
As now in this step, we are ready to apply data mining techniques on the data. Basically, it is to discover the
interesting patterns. Hence, clustering and association analysis are among the many different techniques
present. Also, as we used for data mining.
f. Pattern Evaluation and Knowledge Presentation
Generally, this step includes visualization, transformation, removing redundant patterns from the patterns we
generated.
g. Decisions / Use of Discovered Knowledge
As this step is beneficial to us. Also, it helps to use the knowledge acquired to take better decisions.