Data Warehouse and Data Mining (2 marks questions and answers)
What is Fact Table?
A fact table is the central table in a data warehouse that stores quantitative data (measures)
such as sales amount, profit, quantity, etc., and links to dimension tables through foreign
keys.
What is Dimension Table?
A dimension table contains descriptive attributes (dimensions) related to facts, such as
customer name, product details, time, or location. They provide context for facts.
What are Data Marts?
A data mart is a subset of a data warehouse designed for a specific business area (e.g., sales,
finance, HR).
What is Cube?
A cube (OLAP cube) is a multidimensional data structure that allows data to be modeled and
viewed across multiple dimensions for fast analysis.
What is Regression?
Regression is a statistical method in data mining used to predict a continuous value (e.g.,
predicting house prices based on size, location, etc.).
What is SVM?
SVM (Support Vector Machine) is a supervised machine learning algorithm used for
classification and regression by finding the best separating hyperplane.
What is Schema?
A schema defines the logical structure of a database, including tables, relationships, and
constraints (e.g., star schema, snowflake schema in data warehousing).
What is Classification?
Classification is a supervised learning technique in which data is categorized into
predefined classes (e.g., spam or not spam).
What is Outlier Analysis in Data Mining?
Outlier analysis is the process of identifying data points that deviate significantly from the
majority, which may indicate fraud, errors, or novel events.
Define Data Warehouse.
A data warehouse is a centralized repository that stores integrated, historical, and subject-
oriented data from multiple sources for decision-making and analysis.
What is Meant by Clustering?
Clustering is an unsupervised learning technique where data is grouped into clusters so that
objects in the same cluster are more similar than in different clusters.
What is Roll-Up Operation?
Roll-up is an OLAP operation that summarizes data by climbing up a hierarchy (e.g., from
days → months → years).
What is Drill-Down Operation?
Drill-down is an OLAP operation that provides more detailed data by moving from higher-
level summary to lower-level detail (e.g., from year → month → day).
Define Closed Frequent Item Set.
A closed frequent item set is a frequent item set that has no superset with the same support
count.
What is Data Reduction?
Data reduction reduces the volume of data while maintaining its integrity for analysis (e.g.,
dimensionality reduction, aggregation, sampling).
What is Bagging, Boosting?
Bagging (Bootstrap Aggregating): Combines predictions of multiple models trained on
random subsets of data to reduce variance.
Boosting: Sequentially trains models, giving more weight to misclassified instances, to
reduce bias and improve accuracy.
What is Web Data Mining?
Web data mining is the process of extracting knowledge from web content, web structure,
and web usage.
What is Text Data Mining?
Text data mining extracts useful patterns and knowledge from unstructured text
documents.
What is Multimedia Data Mining?
Multimedia data mining extracts patterns and knowledge from multimedia data such as
images, audio, video, and graphics.
What is Supervised Learning Technique?
A supervised learning technique trains models using labeled data (input-output pairs) to
predict outcomes for unseen data (e.g., classification, regression).
List out Data Mining Functionalities.
Classification, Prediction, Clustering, Association rule mining, Outlier detection,
Summarization.
What is the Purpose of Clustering in Data Mining?
The purpose is to discover natural groupings in data, understand structure, and identify
patterns without predefined labels.
What is Cluster Analysis?
Cluster analysis is the process of grouping a set of objects into clusters based on similarity
or distance metrics.
What is Association and Correlation?
Association: Identifies relationships between items in large datasets (e.g., market basket
analysis: milk → bread).
Correlation: Measures the statistical relationship between two variables.
What is Data Mining?
Data mining is the process of discovering patterns, correlations, and knowledge from large
datasets using statistical, machine learning, and database techniques.
What is ETL?
ETL stands for Extract, Transform, Load – the process of extracting data from sources,
transforming it into a suitable format, and loading it into a data warehouse.
What is Data Cleaning/Integration/Transformation/Discretization?
Data Cleaning: Removing noise, errors, and inconsistencies.
Data Integration: Combining data from multiple sources.
Data Transformation: Converting data into suitable formats (e.g., normalization).
Data Discretization: Converting continuous data into categorical intervals.
List out Data Warehouse Components.
Data sources, ETL tools, Staging area, Data warehouse storage, Metadata, OLAP engine,
Front-end tools (reporting, dashboards).
What is Data Preprocessing?
Data preprocessing is the initial step in data mining that prepares raw data by cleaning,
transforming, reducing, and integrating it to improve quality for analysis.