solved DM questions
solved DM questions
Data mining refers to the process of extracting or mining interesting knowledge or patterns from
large amounts of data.
(a) No, Data mining is not another hype. "We are living in the information age" is a popular
saying; however, we are actually living in the data age. Terabytes or petabytes of data pour
into our computer networks, the World Wide Web (WWW), and various data storage
devices every day from business, society, science and engineering, medicine, and almost
every other aspect of daily life. Powerful and versatile tools are badly needed to
automatically uncover valuable information from the tremendous amounts of data and to
transform such data into organized knowledge. This necessity has led to the birth of data
mining.
(b) No. Data mining is not a simple transformation of technology developed from databases,
statistics, and machine learning. Instead, it involves an integration of data rather than a
simple transformation of techniques from multiple disciplines such as database technology,
statistics, machine learning, high-performance computing, pattern recognition, neural
networks, data visualization, and information retrieval and so on.
(d) Steps involved in Data mining when viewed as Knowledge Discovery process.
Data Cleaning- a process that removes or transforms noise and inconsistent data.
Data Integration- where data from heterogeneous data sources is combined for mining
purpose.
Data Selection- where data relevant to the analysis task are retrieved from the database.
Data Transformation - where data is transformed or consolidated into forms suitable for
mining.
Data Mining - an essential process where intelligent and efficient methods are applied in order
to extract patterns.
Pattern Evaluation - a process that identifies the truly interesting patterns representing
knowledge based on some interestingness measures.
4. Data is balanced within the scope Data must be integrated and balanced
of this one system. from multiple system.
8. ER based. Star/Snowflake.
What is Database?
A database is a collection of related data which represents some elements of the
real world. It is designed to be built and populated with data for a specific task. It
is also a building block of your data solution.
What is a Data Warehouse?
A data warehouse is an information system which stores historical and
commutative data from single or multiple sources. It is designed to analyze,
report, integrate transaction data from different sources.
2 - Cross Dataset : The same as cross validation, but using different datasets.
3 - Tuning your model : Its basically change the parameters you're using to train your classification model (IDK
which classification algorithm you're using so its hard to help more).
4 - Improve, or use (if you're not using) the normalization process : Discover which techniques will provide a more
concise data to you to use on the training.
5 - Understand more the problem you're treating... Try to implement other methods to solve the same problem.
Always there's at least more than one way to solve the same problem. You maybe not using the best approach.
6-More Data: More variety and more volume will give better results.
7-Ensemble Methods : (Probably the easiest and most interesting) Combining of multiple weak
models to make a strong model with better prediction by compensating for each other losses.