Presented to :
Dr. Rabie
By :
Amr Abd EL Latief Abd El Al
Data Mining Def.
 Def. :
 Data mining is the extraction of interesting patterns or
knowledge from huge amount of data.
Known different names :
 knowledge discovery (mining) in databases (KDD)
 knowledge extraction,
 data/pattern analysis,
 data archeology,
 data dredging,
 information harvesting,
 business intelligence and others. [1]
What is Data Mining
 Data Mining enables data exploration, data analysis,
and data visualization of huge databases at a high level
of abstraction, without a specific hypothesis in mind.
 working of data mining is understood by using a
method called modeling with it to make predictions.
Data Mining Technologies
 include :
 artificial neural networks
 decision trees
 genetic algorithms.
 Machine Learning .
 Evolutionary Computing
 MOEA Multi objective Evolutionary
Computing
Data Mining System Arch.
Data Mining Procedure
The Process of Data Mining
Classifications
Data Types
Application
Data Types
Data Structure
Functionality
Data Types Application S.V.
 Business transactions
 Scientific data
 Medical and personal data
 Surveillance video and pictures
 Satellite sensing
 Text reports and memos (e-mail messages)
 Most of the communications
 The World Wide Web repositories
types of data (Data Structure S.V.)
 Flat files
 Relational Databases
 Data Warehouses
 Transaction Databases
 Multimedia Databases
 Spatial Databases
 World Wide Web
FUNCTIONALITIES AND
CLASSIFICATIONS OF
DATA MINING
 Characterization
 Discrimination
 Association analysis
 Classification
 uses given class labels to order the objects in
 the data collection Classification approaches normally use a
 training set where all objects are already associated with
 known class labels. The classification algorithm learns from
 the training set and builds a model. The model is used to
 classify new objects.
 Prediction
 Prediction
Data Mining Systems
specialized
data source mined
dataClassification
according to the data
drawn on modmodel
el drawn on
kind of knowledge
discovered
mining techniques
used
comprehensive
Classification according to the type
of data source mined
 This classification categorizes data mining systems
according to the type of data handled:
 spatial data
 multimedia data
 time-series data
 text data
 World Wide Web.
Classification according to the data
model drawn on
 This classification categorizes data mining systems
based on the data model involved:
 Relational database
 object-oriented database
 data warehouse
 Transactional
 others
Classification according to the king
of knowledge discovered
 This classification categorizes data mining systems
based on the kind of knowledge discovered or data
mining functionalities:
 Characterization
 discrimination
 Association
 classification
 clustering
 others
Classification according to mining
techniques used
 The classification categorizes data mining systems
according to the data analysis approach used:
 machine learning
 neural networks
 Genetic algorithms
 Statistics
 visualization
 database oriented
 data warehouse-oriented
 others
take into account the degree of
user interaction involved in the
data mining process
 query-driven systems,
 interactive exploratory systems
 autonomous systems
Note:
 A comprehensive system would provide a wide variety
of data mining techniques to fit different situations
and options, and offer different degrees of user
interaction.
[2]
Papers
Data Mining Goals
 the two main goals of DM are:
 description
 prediction.
 Standard tasks in the field of DM are: description,
clustering, association discovery, sequential pattern
analysis, classification and regression.
 Description : can be obtained by characterization or by
discrimination.
 Characterization: is a summarization of the general features
 Discrimination :does not differ too much from
characterization. It consists of characterizing a class by
comparison with another one.
Data Mining Goals
 Clustering differs from classification since it analyses data
objects without knowing their class.
 Association : discovery results in a set of association rules
which represents attribute-value conditions frequently
occurring in a given set of data.
 Sequential pattern analysis : consists in searching for
frequently occurring patterns related to time.
 Regression : uses existing values of some variables in order
to forecast what values of another continuous variable will
be
Machine Learning
 A ML system uses an entire finite set of objects,
examples which represent observations of the
environment ; the learning algorithm learns a model
from this set which is called the training set.
 ML In DM include:
 databases
 data warehouses
 flat files
Classification in DM
 Classification:
is a form of data analysis that can be used to extract
models describing important classes or to predict future
trends.
 It represents :
learning paradigm which consists in segmenting data by
assigning it to groups, or classes,, that are already defined.
 the assumption is a small database size but In Data Mining
it must be scalable technique.
Classification in DM
 classes are represented by:
the values of a particular attribute called goal attribute
and remaining attributes are called predicting
attribute.
 resulting model is usually represented as:
a set of IF-THEN prediction rules where each one
predicts a class from the predicting attributes.
ML in Classification
 Procedure:
 Algorithms are first applied to the so-called training set
which contains training examples with a known class to
discover rules.
 the model is used for classification on a set of examples,
called the test set.
 The predictive accuracy of the model is evaluated on the
test set
Classification Methods
 Main classification methods are:
 decision tree induction
 Scalability problem
 Bayesian classification
 neural network learning.
 Draw Backs:
 Time-consuming
 difficulty for humans to interpret their results.
ASSOCIATION ANALYSIS
 They show relationships between attributes. Their
typical application domain is market basket and
transaction data analysis.
 Association Rules:
 An association rule is generally defined as an expression
 X=>Y,
 where X and Y are sets of attribute-value terms
ASSOCIATION ANALYSIS
 Rules are not supposed to be strictly correct in order
for them to be useful. It is generally required to find
rules which are true to some degree only.
 X implies Y
 X tends to imply Y
 Support and confidence
Apriori Algorithm
 Depends on Frqeuent occurence
 Draw Backs :
 Large number of database scans
 Large size of generated intermediate sets.
 Apriori mining only Boolean and single-dimensional
association rules.
 These rules are adapted to market basket analysis and can
GA Advantages in Data Mining
 DM problem needs: robustness of solutions and
scalability
 GA Advantages:
 there is high ability to find patterns in vey large spaces.
 parallel implementation
 It performs a kind Of global search rather than local
hill-climbing.
 the patterns produced are directly understandable
Search Challenges
 scalability problems is an important research
challenge too.
 MULTI-OBJECTIVE RULE EXTRACTION
 MOEA Issues
Aperior Ex.

More Related Content

PPTX
Data mining
PPTX
Data mining , Knowledge Discovery Process, Classification
DOCX
Chromium os architecture report
PPTX
Introduction to Data Mining
PPT
Machine Learning
PPT
HCI 3e - Ch 8: Implementation support
DOCX
grading transcript format.docx
PPTX
Er diagram
Data mining
Data mining , Knowledge Discovery Process, Classification
Chromium os architecture report
Introduction to Data Mining
Machine Learning
HCI 3e - Ch 8: Implementation support
grading transcript format.docx
Er diagram

What's hot (20)

PPTX
Data Mining : Concepts
PPTX
Introduction to Data mining
PPTX
Data Mining: What is Data Mining?
PPT
Introduction To Data Mining
PPTX
Data Mining & Applications
PPT
Introduction to Data Mining
PDF
Exploratory data analysis data visualization
PPT
Data mining
PPT
Data mining slides
 
PPT
Data preprocessing
PPT
Knowledge discovery thru data mining
PPT
1.2 steps and functionalities
PPT
01 Data Mining: Concepts and Techniques, 2nd ed.
PPTX
Text MIning
PPT
PPTX
Data Preprocessing || Data Mining
PPTX
Exploratory data analysis with Python
PPTX
3 Data Mining Tasks
PPT
4.2 spatial data mining
PPTX
Data mining presentation.ppt
Data Mining : Concepts
Introduction to Data mining
Data Mining: What is Data Mining?
Introduction To Data Mining
Data Mining & Applications
Introduction to Data Mining
Exploratory data analysis data visualization
Data mining
Data mining slides
 
Data preprocessing
Knowledge discovery thru data mining
1.2 steps and functionalities
01 Data Mining: Concepts and Techniques, 2nd ed.
Text MIning
Data Preprocessing || Data Mining
Exploratory data analysis with Python
3 Data Mining Tasks
4.2 spatial data mining
Data mining presentation.ppt
Ad

Viewers also liked (19)

PDF
Libro l4
PDF
Certificate_35
PPTX
La amistad
PDF
Manger-et-penser-bio
PDF
Jun 06 jorge tuto quiroga - oea - reeleccion evo
PDF
Qualification of the NDI process
PPSX
Sorteo alianzas
PPTX
Higado y vias biliares
PDF
Traitement et Exploitation des nuages de points (LiDAR)
DOCX
Determinantes
PDF
Letra t t
PDF
Raising Tomatoes Workshop
DOCX
LiDAR et traces agraires fossiles autour de Besançon : potentiel et limites d...
DOCX
ESTUDIOS DE VELOCIDADES EN CARRETERAS
PDF
What, Why & How of Crowdfunding
PPTX
Golden hollywood
PPTX
3. synergy and convergence
PDF
Jamie A Cowan, Timendo - Solocal Group UK Event "How To Drive Online Traffic ...
PPTX
Northern Illinois Rockford Heart Walk Slated for May of 2015
Libro l4
Certificate_35
La amistad
Manger-et-penser-bio
Jun 06 jorge tuto quiroga - oea - reeleccion evo
Qualification of the NDI process
Sorteo alianzas
Higado y vias biliares
Traitement et Exploitation des nuages de points (LiDAR)
Determinantes
Letra t t
Raising Tomatoes Workshop
LiDAR et traces agraires fossiles autour de Besançon : potentiel et limites d...
ESTUDIOS DE VELOCIDADES EN CARRETERAS
What, Why & How of Crowdfunding
Golden hollywood
3. synergy and convergence
Jamie A Cowan, Timendo - Solocal Group UK Event "How To Drive Online Traffic ...
Northern Illinois Rockford Heart Walk Slated for May of 2015
Ad

Similar to Data mining concepts and work (20)

PPTX
Seminar Presentation
PPT
Talk
PPTX
Data Mining: Data mining classification and analysis
PPTX
Data Mining: Classification and analysis
PDF
data mining
PPT
Part1
PPTX
PPTX
Unit3-AssociationRuleMining and data techniques.pptx
PDF
G045033841
PDF
Data Mining System and Applications: A Review
PPT
20IT501_DWDM_PPT_Unit_II.ppt
PPTX
Knowledge Discovery & Representation
PDF
Privacy preservation techniques in data mining
PDF
Privacy preservation techniques in data mining
PPTX
Chap1-Introduction.pptx. Data Mining and introduction about it in a specified...
PPT
20IT501_DWDM_PPT_Unit_II.ppt
PPTX
Data mining an introduction
PPTX
Introduction_to_Data_Mining12345678.pptx
PDF
Data Mining Classification Comparison (Naïve Bayes and C4.5 Algorithms)
Seminar Presentation
Talk
Data Mining: Data mining classification and analysis
Data Mining: Classification and analysis
data mining
Part1
Unit3-AssociationRuleMining and data techniques.pptx
G045033841
Data Mining System and Applications: A Review
20IT501_DWDM_PPT_Unit_II.ppt
Knowledge Discovery & Representation
Privacy preservation techniques in data mining
Privacy preservation techniques in data mining
Chap1-Introduction.pptx. Data Mining and introduction about it in a specified...
20IT501_DWDM_PPT_Unit_II.ppt
Data mining an introduction
Introduction_to_Data_Mining12345678.pptx
Data Mining Classification Comparison (Naïve Bayes and C4.5 Algorithms)

More from Amr Abd El Latief (11)

PPTX
master-journey.pptx
PPTX
Micro frontend
PPTX
I feel presentation [autosaved]
PPTX
Design p atterns
PPTX
AngularJs advanced Topics
PPTX
Angular js slides
PPTX
Test vector compression
PPTX
Designing energy efficient lte
PPT
Stock market analysis using ga and neural network
PPTX
Marketing plane of cadbry bupply kids
PPTX
Test vector compression in Digital Testing
master-journey.pptx
Micro frontend
I feel presentation [autosaved]
Design p atterns
AngularJs advanced Topics
Angular js slides
Test vector compression
Designing energy efficient lte
Stock market analysis using ga and neural network
Marketing plane of cadbry bupply kids
Test vector compression in Digital Testing

Recently uploaded (20)

PPTX
Streamlining Project Management in the AV Industry with D-Tools for Zoho CRM ...
PDF
WhatsApp Chatbots The Key to Scalable Customer Support.pdf
PPT
3.Software Design for software engineering
PPTX
MCP empowers AI Agents from Zero to Production
PDF
infoteam HELLAS company profile 2025 presentation
PDF
Building an Inclusive Web Accessibility Made Simple with Accessibility Analyzer
PPTX
SAP Business AI_L1 Overview_EXTERNAL.pptx
PDF
Streamlining Project Management in Microsoft Project, Planner, and Teams with...
PPTX
Post-Migration Optimization Playbook: Getting the Most Out of Your New Adobe ...
PPTX
Foundations of Marketo Engage: Nurturing
PDF
Coding with GPT-5- What’s New in GPT 5 That Benefits Developers.pdf
PPTX
Comprehensive Guide to Digital Image Processing Concepts and Applications
PPTX
Presentation - Summer Internship at Samatrix.io_template_2.pptx
PPTX
Chapter_05_System Modeling for software engineering
PPTX
Independent Consultants’ Biggest Challenges in ERP Projects – and How Apagen ...
PPTX
UNIT II: Software design, software .pptx
PPTX
Swiggy API Scraping A Comprehensive Guide on Data Sets and Applications.pptx
PDF
Understanding the Need for Systemic Change in Open Source Through Intersectio...
PDF
Multiverse AI Review 2025_ The Ultimate All-in-One AI Platform.pdf
PPTX
Lesson-3-Operation-System-Support.pptx-I
Streamlining Project Management in the AV Industry with D-Tools for Zoho CRM ...
WhatsApp Chatbots The Key to Scalable Customer Support.pdf
3.Software Design for software engineering
MCP empowers AI Agents from Zero to Production
infoteam HELLAS company profile 2025 presentation
Building an Inclusive Web Accessibility Made Simple with Accessibility Analyzer
SAP Business AI_L1 Overview_EXTERNAL.pptx
Streamlining Project Management in Microsoft Project, Planner, and Teams with...
Post-Migration Optimization Playbook: Getting the Most Out of Your New Adobe ...
Foundations of Marketo Engage: Nurturing
Coding with GPT-5- What’s New in GPT 5 That Benefits Developers.pdf
Comprehensive Guide to Digital Image Processing Concepts and Applications
Presentation - Summer Internship at Samatrix.io_template_2.pptx
Chapter_05_System Modeling for software engineering
Independent Consultants’ Biggest Challenges in ERP Projects – and How Apagen ...
UNIT II: Software design, software .pptx
Swiggy API Scraping A Comprehensive Guide on Data Sets and Applications.pptx
Understanding the Need for Systemic Change in Open Source Through Intersectio...
Multiverse AI Review 2025_ The Ultimate All-in-One AI Platform.pdf
Lesson-3-Operation-System-Support.pptx-I

Data mining concepts and work

  • 1. Presented to : Dr. Rabie By : Amr Abd EL Latief Abd El Al
  • 2. Data Mining Def.  Def. :  Data mining is the extraction of interesting patterns or knowledge from huge amount of data. Known different names :  knowledge discovery (mining) in databases (KDD)  knowledge extraction,  data/pattern analysis,  data archeology,  data dredging,  information harvesting,  business intelligence and others. [1]
  • 3. What is Data Mining  Data Mining enables data exploration, data analysis, and data visualization of huge databases at a high level of abstraction, without a specific hypothesis in mind.  working of data mining is understood by using a method called modeling with it to make predictions.
  • 4. Data Mining Technologies  include :  artificial neural networks  decision trees  genetic algorithms.  Machine Learning .  Evolutionary Computing  MOEA Multi objective Evolutionary Computing
  • 7. The Process of Data Mining
  • 9. Data Types Application S.V.  Business transactions  Scientific data  Medical and personal data  Surveillance video and pictures  Satellite sensing  Text reports and memos (e-mail messages)  Most of the communications  The World Wide Web repositories
  • 10. types of data (Data Structure S.V.)  Flat files  Relational Databases  Data Warehouses  Transaction Databases  Multimedia Databases  Spatial Databases  World Wide Web
  • 11. FUNCTIONALITIES AND CLASSIFICATIONS OF DATA MINING  Characterization  Discrimination  Association analysis  Classification  uses given class labels to order the objects in  the data collection Classification approaches normally use a  training set where all objects are already associated with  known class labels. The classification algorithm learns from  the training set and builds a model. The model is used to  classify new objects.  Prediction  Prediction
  • 12. Data Mining Systems specialized data source mined dataClassification according to the data drawn on modmodel el drawn on kind of knowledge discovered mining techniques used comprehensive
  • 13. Classification according to the type of data source mined  This classification categorizes data mining systems according to the type of data handled:  spatial data  multimedia data  time-series data  text data  World Wide Web.
  • 14. Classification according to the data model drawn on  This classification categorizes data mining systems based on the data model involved:  Relational database  object-oriented database  data warehouse  Transactional  others
  • 15. Classification according to the king of knowledge discovered  This classification categorizes data mining systems based on the kind of knowledge discovered or data mining functionalities:  Characterization  discrimination  Association  classification  clustering  others
  • 16. Classification according to mining techniques used  The classification categorizes data mining systems according to the data analysis approach used:  machine learning  neural networks  Genetic algorithms  Statistics  visualization  database oriented  data warehouse-oriented  others
  • 17. take into account the degree of user interaction involved in the data mining process  query-driven systems,  interactive exploratory systems  autonomous systems Note:  A comprehensive system would provide a wide variety of data mining techniques to fit different situations and options, and offer different degrees of user interaction.
  • 19. Data Mining Goals  the two main goals of DM are:  description  prediction.  Standard tasks in the field of DM are: description, clustering, association discovery, sequential pattern analysis, classification and regression.  Description : can be obtained by characterization or by discrimination.  Characterization: is a summarization of the general features  Discrimination :does not differ too much from characterization. It consists of characterizing a class by comparison with another one.
  • 20. Data Mining Goals  Clustering differs from classification since it analyses data objects without knowing their class.  Association : discovery results in a set of association rules which represents attribute-value conditions frequently occurring in a given set of data.  Sequential pattern analysis : consists in searching for frequently occurring patterns related to time.  Regression : uses existing values of some variables in order to forecast what values of another continuous variable will be
  • 21. Machine Learning  A ML system uses an entire finite set of objects, examples which represent observations of the environment ; the learning algorithm learns a model from this set which is called the training set.  ML In DM include:  databases  data warehouses  flat files
  • 22. Classification in DM  Classification: is a form of data analysis that can be used to extract models describing important classes or to predict future trends.  It represents : learning paradigm which consists in segmenting data by assigning it to groups, or classes,, that are already defined.  the assumption is a small database size but In Data Mining it must be scalable technique.
  • 23. Classification in DM  classes are represented by: the values of a particular attribute called goal attribute and remaining attributes are called predicting attribute.  resulting model is usually represented as: a set of IF-THEN prediction rules where each one predicts a class from the predicting attributes.
  • 24. ML in Classification  Procedure:  Algorithms are first applied to the so-called training set which contains training examples with a known class to discover rules.  the model is used for classification on a set of examples, called the test set.  The predictive accuracy of the model is evaluated on the test set
  • 25. Classification Methods  Main classification methods are:  decision tree induction  Scalability problem  Bayesian classification  neural network learning.  Draw Backs:  Time-consuming  difficulty for humans to interpret their results.
  • 26. ASSOCIATION ANALYSIS  They show relationships between attributes. Their typical application domain is market basket and transaction data analysis.  Association Rules:  An association rule is generally defined as an expression  X=>Y,  where X and Y are sets of attribute-value terms
  • 27. ASSOCIATION ANALYSIS  Rules are not supposed to be strictly correct in order for them to be useful. It is generally required to find rules which are true to some degree only.  X implies Y  X tends to imply Y  Support and confidence
  • 28. Apriori Algorithm  Depends on Frqeuent occurence  Draw Backs :  Large number of database scans  Large size of generated intermediate sets.  Apriori mining only Boolean and single-dimensional association rules.  These rules are adapted to market basket analysis and can
  • 29. GA Advantages in Data Mining  DM problem needs: robustness of solutions and scalability  GA Advantages:  there is high ability to find patterns in vey large spaces.  parallel implementation  It performs a kind Of global search rather than local hill-climbing.  the patterns produced are directly understandable
  • 30. Search Challenges  scalability problems is an important research challenge too.  MULTI-OBJECTIVE RULE EXTRACTION  MOEA Issues