0% found this document useful (0 votes)

45 views

Data Mining: Priyanka Nemalikanti

KDD is the process of discovering meaningful patterns and knowledge from large amounts of data. It involves cleaning, transforming, and modeling the data to extract useful insights. Data mining is a key part of KDD as it uses algorithms to identify patterns. Traditional data analysis methods struggle with modern data challenges like high dimensionality. Descriptive tasks analyze data characteristics while predictive tasks induce patterns to make predictions about future data. Both are important approaches in data mining.

Uploaded by

priya nemalikanti

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

45 views

Data Mining: Priyanka Nemalikanti

Uploaded by

priya nemalikanti

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 5

Data Mining

Priyanka Nemalikanti
Knowledge Discovery in Databases (KDD)

KDD is Knowledge Discovery in Databases, and it's the method used to find, transform,

and refine data that is meaningful data and patterns from a raw database so that it can be utilised

in various applications or domains. KDD is a lengthy and complicated process that entails many

steps and iterations. KDD in data mining is an analytical and programmed approach used to

model data that is retrieved from a database to extract valuable and applicable knowledge. Data

mining is the main backbone of KDD, and thus it is crucial in the whole process.

KDD utilises different algorithms that are mostly self-learning to help deduce important

patterns from data that has been processed. There are various steps involved in a KDD process.

These include setting a goal and understanding the application, selecting data and integrating it,

data cleaning and preprocessing, the transformation of data, data mining, pattern evaluation and

interpretation and knowledge discovery and use (Tan, et al., 2016).

Motivating challenges

Traditional data analysis methods have encountered practical difficulties while meeting

the numerous challenges posed by the new data sets. There are multiple challenges such as

scalability, high dimensionality, heterogeneous and complex data, data ownership and

distribution, and non-traditional analysis. High dimensionality is one of the specific challenges

that has motivated the development of data mining. It is normal to encounter data sets with many

attributes instead of just a few common attributes a few decades ago. Progress in microarray

technology has helped produce gene expression data involving more than a thousand features.

Data sets that have spatial or temporal component usually has high dimensionality. For example,

if a person can consider data that has measurements of temperature located in different places. If
the temperature measurements are mostly taken repeatedly for an extended period, then the

number of features also increases in proportion to the number of measurements taken.

Traditional data analysis methods made from low dimensional data mostly do not work well for

high dimensional data. For some data analysis algorithms its important to note that the

computational complexity increases rapidly when the dimensionality is increasing (Tan, et al.,

2016).

Note how data mining integrates with the components of statistics and AI, ML, and Pattern

Recognition.

Data mining, statistics, AI, and machine learning are all interesting data-driven

disciplines that help a company succeed in making the best decision and positively affect the

organisation's growth. These disciplines are considered to be the same with just a few minor

differences. Hence, they can be referred to as identical twins, which use different terminologies

and words and follow different notations. Data mining is used to find out hidden patterns stored

in large data warehouses and does this by using the power of statistics, artificial intelligence,

machine learning and pattern recognition. Data mining mainly uses the power of machine

learning, statistics and database technique to succeed in mining large databases and hence

coming up with patterns

Difference between predictive and descriptive tasks and the importance of each.

Descriptive tasks describe the data characteristics in a target data set. In contrast, the

predictive tasks mostly carry out the induction over the past and current data to make predictions.

The descriptive technique is more accurate and precise when it is compared with predictive

mining tasks. The predictive analysis entails control over situations and responding to them,
while descriptive analysis only responds to the situation. It’s important to note that descriptive

mining tasks employ unsupervised learning functions while the predictive task uses a supervised

learning technique (Tan, et al., 2016). Predictive tasks are important as it helps predict future

results instead of the current behaviour, while descriptive tasks help determine the data

regularities and reveal patterns.

References

Tan, P. N., Steinbach, M., & Kumar, V. (2016). Introduction to data mining. Pearson Education

India.

Oracle Adf Calendar Step by Step Implementation
67% (3)
Oracle Adf Calendar Step by Step Implementation
14 pages
521 Using Data and Information For Decision Making Response Sheet (1) - Busingye Patricia
No ratings yet
521 Using Data and Information For Decision Making Response Sheet (1) - Busingye Patricia
11 pages
SCD Type 1 Implementation Using Informatica PowerCenter
No ratings yet
SCD Type 1 Implementation Using Informatica PowerCenter
7 pages
Chapter 1 - What is Data Mining
No ratings yet
Chapter 1 - What is Data Mining
8 pages
Notes Module 2
No ratings yet
Notes Module 2
28 pages
BCA Data Mining
No ratings yet
BCA Data Mining
116 pages
Soln 1
100% (1)
Soln 1
6 pages
Types of attributes-1
No ratings yet
Types of attributes-1
8 pages
DM Module1
No ratings yet
DM Module1
15 pages
Lesson 1
No ratings yet
Lesson 1
32 pages
p144 Data Mining
100% (3)
p144 Data Mining
11 pages
Week 1 Homework ITS 632 UC
No ratings yet
Week 1 Homework ITS 632 UC
7 pages
New Note
No ratings yet
New Note
23 pages
Data Mining
No ratings yet
Data Mining
87 pages
Data Mining
No ratings yet
Data Mining
25 pages
6 TheRealTimeFaceDetectionandRecognitionSystem
No ratings yet
6 TheRealTimeFaceDetectionandRecognitionSystem
48 pages
Unit 1 Data Mining
No ratings yet
Unit 1 Data Mining
30 pages
What Is Data Mining?
No ratings yet
What Is Data Mining?
17 pages
SSRN Id3919922
No ratings yet
SSRN Id3919922
12 pages
dwm NOTES
No ratings yet
dwm NOTES
118 pages
10.1.1.449.1341
No ratings yet
10.1.1.449.1341
3 pages
Data Mining: Prof Jyotiranjan Hota
No ratings yet
Data Mining: Prof Jyotiranjan Hota
17 pages
Subject Data Warehouse
No ratings yet
Subject Data Warehouse
42 pages
Data Mining For Humanity: An Overview
No ratings yet
Data Mining For Humanity: An Overview
4 pages
Unit I DM
No ratings yet
Unit I DM
27 pages
datamining&warehousing
No ratings yet
datamining&warehousing
65 pages
Dwdm Unit-II Notes
No ratings yet
Dwdm Unit-II Notes
29 pages
Data Mining Versus Knowledge Discovery I
No ratings yet
Data Mining Versus Knowledge Discovery I
3 pages
unit 3 BI & Data science (1)
No ratings yet
unit 3 BI & Data science (1)
19 pages
Whats App
No ratings yet
Whats App
23 pages
Paper Dinesh Clustering Techniques
No ratings yet
Paper Dinesh Clustering Techniques
5 pages
V3N2 121 PDF
No ratings yet
V3N2 121 PDF
4 pages
The Survey of Data Mining Applications and Feature Scope
No ratings yet
The Survey of Data Mining Applications and Feature Scope
16 pages
Data Mining - Digital Notes (Unit I To V)
No ratings yet
Data Mining - Digital Notes (Unit I To V)
85 pages
Data Minng
No ratings yet
Data Minng
20 pages
Notes for DMDWH -Module1
No ratings yet
Notes for DMDWH -Module1
21 pages
Data Mining Notes
No ratings yet
Data Mining Notes
14 pages
Data Mining 4545
No ratings yet
Data Mining 4545
20 pages
DWDM REFERENCE NOTES
No ratings yet
DWDM REFERENCE NOTES
126 pages
Fujipress - JACIII 21 1 5
No ratings yet
Fujipress - JACIII 21 1 5
18 pages
DMW - Unit 1
No ratings yet
DMW - Unit 1
21 pages
Unit 1 Datamining
No ratings yet
Unit 1 Datamining
16 pages
Unit I DATA MINING AAGAC
No ratings yet
Unit I DATA MINING AAGAC
27 pages
B SC (IT) VI-DSE3-M5
No ratings yet
B SC (IT) VI-DSE3-M5
13 pages
Fundamentals of Data Science Unit 1
No ratings yet
Fundamentals of Data Science Unit 1
29 pages
BI_Unit 5
No ratings yet
BI_Unit 5
9 pages
Chapter 3: Data Mining
No ratings yet
Chapter 3: Data Mining
20 pages
DM NOTES PRA
No ratings yet
DM NOTES PRA
63 pages
DM - MOD - 1 Part I
No ratings yet
DM - MOD - 1 Part I
9 pages
Chapter 1___Data Mining and Data Warehouse
No ratings yet
Chapter 1___Data Mining and Data Warehouse
44 pages
3-OLAP Operations-13!08!2021 (13-Aug-2021) Material I 13-Aug-2021 Data Mining - Introductory Slides
No ratings yet
3-OLAP Operations-13!08!2021 (13-Aug-2021) Material I 13-Aug-2021 Data Mining - Introductory Slides
37 pages
1.1 - Data Mining
No ratings yet
1.1 - Data Mining
18 pages
Chapter 02
No ratings yet
Chapter 02
12 pages
UNIT-1 Introduction To Data Mining
No ratings yet
UNIT-1 Introduction To Data Mining
29 pages
Data Mining
No ratings yet
Data Mining
157 pages
unit1DM
No ratings yet
unit1DM
16 pages
p196 - Knowledge Discovery in Databases
No ratings yet
p196 - Knowledge Discovery in Databases
8 pages
DataWarehouseMining Complete Notes
No ratings yet
DataWarehouseMining Complete Notes
55 pages
lecture1428550844
No ratings yet
lecture1428550844
84 pages
Unit 1
No ratings yet
Unit 1
11 pages
Data Science Mastery: From Beginner to Expert in Big Data Analytics
From Everand
Data Science Mastery: From Beginner to Expert in Big Data Analytics
Kameron Hussain
No ratings yet
Data Mining: Fundamentals and Applications
From Everand
Data Mining: Fundamentals and Applications
Fouad Sabry
No ratings yet
Data Science
From Everand
Data Science
Chloe Martin
No ratings yet
Average Monthly: Azure Vms SQL in Azure Vms Azure Files Sap Hana in Azure Vms
No ratings yet
Average Monthly: Azure Vms SQL in Azure Vms Azure Files Sap Hana in Azure Vms
52 pages
NCSC Project Report 2022
No ratings yet
NCSC Project Report 2022
25 pages
Lustre, in A Nutshell - Local
No ratings yet
Lustre, in A Nutshell - Local
10 pages
Science Process
No ratings yet
Science Process
32 pages
VCA-0029-14 VCMS Maintenance Manual 1-1 PDF
No ratings yet
VCA-0029-14 VCMS Maintenance Manual 1-1 PDF
29 pages
CIS Red Hat Enterprise Linux 5 Benchmark v2.2.0
No ratings yet
CIS Red Hat Enterprise Linux 5 Benchmark v2.2.0
205 pages
Internship Proposal Pertamina
100% (1)
Internship Proposal Pertamina
4 pages
Image Guide PDF
No ratings yet
Image Guide PDF
63 pages
Data Analatics
No ratings yet
Data Analatics
6 pages
2020-2hamming Code
No ratings yet
2020-2hamming Code
10 pages
Respro Rating Form
No ratings yet
Respro Rating Form
5 pages
Course Outline - Introduction To Emerging Technologies
No ratings yet
Course Outline - Introduction To Emerging Technologies
4 pages
Lab 2 - 1 OnScreen Digitizing
No ratings yet
Lab 2 - 1 OnScreen Digitizing
10 pages
Sc-Dlp-Y5-Topic 7
No ratings yet
Sc-Dlp-Y5-Topic 7
8 pages
TechTip 1004 WonderwareHistorian&DifferentRetrievalMethods
No ratings yet
TechTip 1004 WonderwareHistorian&DifferentRetrievalMethods
7 pages
Director LVR Guide PDF
No ratings yet
Director LVR Guide PDF
96 pages
Oracle Multitenant 19c - All About Pluggable D
0% (1)
Oracle Multitenant 19c - All About Pluggable D
67 pages
Whitepaper - Revolutionizing GRC with AI Harnessing the power of LLM and RAG technologies
No ratings yet
Whitepaper - Revolutionizing GRC with AI Harnessing the power of LLM and RAG technologies
20 pages
Chapter 9
No ratings yet
Chapter 9
35 pages
Chapter 1 Fundamental Concepts of Database: What Is A Database?
No ratings yet
Chapter 1 Fundamental Concepts of Database: What Is A Database?
3 pages
Journal Pre-Proofs: Expert Systems With Applications
No ratings yet
Journal Pre-Proofs: Expert Systems With Applications
16 pages
lecture 7- Thematic analysis
No ratings yet
lecture 7- Thematic analysis
5 pages
DB BKP
No ratings yet
DB BKP
14 pages
11.2 Fluent API Cheat Sheet
No ratings yet
11.2 Fluent API Cheat Sheet
3 pages
Bs - DB:: Select Bells, Whistles From Database
No ratings yet
Bs - DB:: Select Bells, Whistles From Database
7 pages
DBMS Chapter 1
No ratings yet
DBMS Chapter 1
24 pages
Plant Information Modelling, Using Artificial Intelligence, For Process Hazard and Risk Analysis Study
No ratings yet
Plant Information Modelling, Using Artificial Intelligence, For Process Hazard and Risk Analysis Study
143 pages