0% found this document useful (0 votes)
678 views4 pages

Motivation of Data Mining

Data mining involves extracting useful patterns from large amounts of data. It has become possible due to increased data collection through automated tools, more powerful computers, and mature data mining algorithms. Data mining can help companies make better decisions by analyzing patterns in data related to areas like web usage, customer purchases, and financial transactions. It allows classification of data to identify useful groups and relationships that can provide insights and predictions. The process of data mining involves cleaning, integrating, selecting, transforming, mining, evaluating, and presenting data patterns and knowledge.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
0% found this document useful (0 votes)
678 views4 pages

Motivation of Data Mining

Data mining involves extracting useful patterns from large amounts of data. It has become possible due to increased data collection through automated tools, more powerful computers, and mature data mining algorithms. Data mining can help companies make better decisions by analyzing patterns in data related to areas like web usage, customer purchases, and financial transactions. It allows classification of data to identify useful groups and relationships that can provide insights and predictions. The process of data mining involves cleaning, integrating, selecting, transforming, mining, evaluating, and presenting data patterns and knowledge.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
You are on page 1/ 4

MOTIVATION OF DATA MINING:

“Necessity is the Mother of Invention”


“We are drowning in data, but starving for knowledge!”

Data explosion

Automated data collection tools and mature database technology lead to tremendous amounts of data stored in
databases, data warehouses and other information repositories
From the Commercial Point of View ,

* Lots of data is being collected and warehoused


o Web data, e-commerce
o Purchases at Department/Grocery Stores
o Bank/Credit Card Transactions
* Computers have become cheaper and more powerful
* Society and everyone: news, digital cameras, etc.,

From the Scientific Point of View,

* Data collected and stored at high Data collected and stored at enormous speeds (GB/hour)
o remote sensors on a satellite
o telescopes scanning the skies
o microarrays generating gene expression data
o scientific simulations generating terabytes of data

EVOLUTION OF DATABASE TECHNOLOGIES


WHAT IS DATA MINING?

* Extraction of interesting (non-trivial, implicit, previously unknown and potentially useful) patterns or knowledge
from huge amount of data.
* Strong patterns can be used to make non-trivial predictions on new data
* Programs that detect patterns and rules in the data
* Data mining is ready for application in the business & scientific community because it is supported by three
technologies that are now sufficiently mature:
o Massive data collection
o Powerful multiprocessor computers
o Data mining algorithms

Data Mining is the discovery of knowledge of analyzing enormous set of data; by extracting the meaning of the data and
then predicting the future trends and also helps companies to take sound decisions, based on knowledge and
information. Data mining software is one of a number of analytical tools for analyzing data. It allows users to analyze
data from many different dimensions or angles, categorize it, and summarize the relationships identified. Technically,
data mining is the process of finding correlations or patterns among dozens of fields in large relational databases.
Data
Data are any facts, numbers, or text that can be processed by a computer. Today, organizations are accumulating vast
and growing amounts of data in different formats and different databases. This includes:

* operational or transactional data such as, sales, cost, inventory, payroll, and accounting

* nonoperational data, such as industry sales, forecast data, and macro economic data

* meta data - data about the data itself, such as logical database design or data dictionary definitions

Information

The patterns, associations, or relationships among all this data can provide information. For example, analysis of retail
point of sale transaction data can yield information on which products are selling and when.
Knowledge

Information can be converted into knowledge about historical patterns and future trends. For example, summary
information on retail supermarket sales can be analyzed in light of promotional efforts to provide knowledge of
consumer buying behavior. Thus, a manufacturer or retailer could determine which items are most susceptible to
promotional efforts.

What is NOT Data Mining?

* Searching a phone number in a phone book


* Searching a keyword on Google
* Generating histograms of salaries for different age groups
* Issuing SQL query to a database and reading the reply

Data Mining is NOT

* Data Warehousing
* (Deductive) query processing
o SQL/ Reporting
* Software Agents
* Expert Systems
* Online Analytical Processing (OLAP)
* Statistical Analysis Tool
* Data visualization

Data warehouse

Dramatic advances in data capture, processing power, data transmission, and storage capabilities are enabling
organizations to integrate their various databases into data warehouses. Data warehousing is defined as a process of
centralized data management and retrieval. Data warehousing, like data mining, is a relatively new term although the
concept itself has been around for years. Data warehousing represents an ideal vision of maintaining a central repository
of all organizational data. Centralization of data is needed to maximize user access and analysis. Dramatic technological
advances are making this vision a reality for many companies. And, equally dramatic advances in data analysis software
are allowing users to access this data freely. The data analysis software is what supports data mining.

Data Mining : On What kind of Data ?

* Relational databases
* Data warehouses
* Transactional databases
* Advanced DB and information repositories
o Object-oriented and object-relational databases
o Spatial databases
o Time-series data and temporal data
o Text databases and multimedia databases
o Heterogeneous and legacy databases

Data Mining: Confluence of Multiple Disciplines

Examples where it can be used

• BANK AGENT:

o Must I grant a mortgage to this customer?

• PERSONNEL MANAGER:

o What kind of employees do I have?

• TRADER in a RETAIL COMPANY:

o How many flat TVs do we expect to sell next month?

Steps involved in Data Mining:

* Data cleaning (to remove noise and inconsistent data);


* Data integration (where multiple data sources may be combined);
* Data selection (where data relevant to the analysis task are retrieved from the database);
* Data transformation (where data are transformed or consolidated into forms appropriate for mining by performing
summary or aggregation operations, for instance);
* Data mining (an essential process where intelligent methods are applied in order to extract data patterns);
* Pattern evaluation (to identify the truly interesting patterns representing knowledge based on some interestingness
measures); and
* Knowledge presentation (where visualization and knowledge representation techniques are used to present the
mined knowledge to the user).

Data mining has five main functions:

* Classification: infers the defining characteristics of a certain group (such as customers who have been lost to
competitors).

* Clustering: identifies groups of items that share a particular characteristic. (Clustering differs from classification in
that no predefining characteristic is given in classification.)

* Association: identifies relationships between events that occur at one time (such as the contents of a shopping
basket).

* Sequencing: similar to association, except that the relationship exists over a period of time (such as repeat visits to a
supermarket or use of a financial planning product).

* Forecasting: estimates future values based on patterns within large sets of data (such as demand forecasting).

Conclusion

Data mining is an evolving technology going through continuous modifications and enhancements. Mining tasks and
techniques use algorithms that are many a times refined versions of tested older algorithms. Though mining
technologies are still in their infancies, yet they are increasingly being used in different business organizations to
increase business efficiency and efficacy.

You might also like