Motivation of Data Mining
Motivation of Data Mining
Data explosion
Automated data collection tools and mature database technology lead to tremendous amounts of data stored in
databases, data warehouses and other information repositories
From the Commercial Point of View ,
* Data collected and stored at high Data collected and stored at enormous speeds (GB/hour)
o remote sensors on a satellite
o telescopes scanning the skies
o microarrays generating gene expression data
o scientific simulations generating terabytes of data
* Extraction of interesting (non-trivial, implicit, previously unknown and potentially useful) patterns or knowledge
from huge amount of data.
* Strong patterns can be used to make non-trivial predictions on new data
* Programs that detect patterns and rules in the data
* Data mining is ready for application in the business & scientific community because it is supported by three
technologies that are now sufficiently mature:
o Massive data collection
o Powerful multiprocessor computers
o Data mining algorithms
Data Mining is the discovery of knowledge of analyzing enormous set of data; by extracting the meaning of the data and
then predicting the future trends and also helps companies to take sound decisions, based on knowledge and
information. Data mining software is one of a number of analytical tools for analyzing data. It allows users to analyze
data from many different dimensions or angles, categorize it, and summarize the relationships identified. Technically,
data mining is the process of finding correlations or patterns among dozens of fields in large relational databases.
Data
Data are any facts, numbers, or text that can be processed by a computer. Today, organizations are accumulating vast
and growing amounts of data in different formats and different databases. This includes:
* operational or transactional data such as, sales, cost, inventory, payroll, and accounting
* nonoperational data, such as industry sales, forecast data, and macro economic data
* meta data - data about the data itself, such as logical database design or data dictionary definitions
Information
The patterns, associations, or relationships among all this data can provide information. For example, analysis of retail
point of sale transaction data can yield information on which products are selling and when.
Knowledge
Information can be converted into knowledge about historical patterns and future trends. For example, summary
information on retail supermarket sales can be analyzed in light of promotional efforts to provide knowledge of
consumer buying behavior. Thus, a manufacturer or retailer could determine which items are most susceptible to
promotional efforts.
* Data Warehousing
* (Deductive) query processing
o SQL/ Reporting
* Software Agents
* Expert Systems
* Online Analytical Processing (OLAP)
* Statistical Analysis Tool
* Data visualization
Data warehouse
Dramatic advances in data capture, processing power, data transmission, and storage capabilities are enabling
organizations to integrate their various databases into data warehouses. Data warehousing is defined as a process of
centralized data management and retrieval. Data warehousing, like data mining, is a relatively new term although the
concept itself has been around for years. Data warehousing represents an ideal vision of maintaining a central repository
of all organizational data. Centralization of data is needed to maximize user access and analysis. Dramatic technological
advances are making this vision a reality for many companies. And, equally dramatic advances in data analysis software
are allowing users to access this data freely. The data analysis software is what supports data mining.
* Relational databases
* Data warehouses
* Transactional databases
* Advanced DB and information repositories
o Object-oriented and object-relational databases
o Spatial databases
o Time-series data and temporal data
o Text databases and multimedia databases
o Heterogeneous and legacy databases
• BANK AGENT:
• PERSONNEL MANAGER:
* Classification: infers the defining characteristics of a certain group (such as customers who have been lost to
competitors).
* Clustering: identifies groups of items that share a particular characteristic. (Clustering differs from classification in
that no predefining characteristic is given in classification.)
* Association: identifies relationships between events that occur at one time (such as the contents of a shopping
basket).
* Sequencing: similar to association, except that the relationship exists over a period of time (such as repeat visits to a
supermarket or use of a financial planning product).
* Forecasting: estimates future values based on patterns within large sets of data (such as demand forecasting).
Conclusion
Data mining is an evolving technology going through continuous modifications and enhancements. Mining tasks and
techniques use algorithms that are many a times refined versions of tested older algorithms. Though mining
technologies are still in their infancies, yet they are increasingly being used in different business organizations to
increase business efficiency and efficacy.