0% found this document useful (0 votes)
18 views8 pages

DW 2marks 1&2

A data warehouse is a centralized system for storing and managing large amounts of historical data from various sources, aiding in analysis and decision-making. Key characteristics include being subject-oriented and non-volatile, with components like ETL tools and metadata. It supports OLAP for complex data analysis and decision-making, differentiating from OLTP systems that handle daily operations.

Uploaded by

swathitech86
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views8 pages

DW 2marks 1&2

A data warehouse is a centralized system for storing and managing large amounts of historical data from various sources, aiding in analysis and decision-making. Key characteristics include being subject-oriented and non-volatile, with components like ETL tools and metadata. It supports OLAP for complex data analysis and decision-making, differentiating from OLTP systems that handle daily operations.

Uploaded by

swathitech86
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 8

1. Define a Data Warehouse.

A data warehouse is a central storage system where large amounts of data from different
sources are collected, stored, and managed for analysis and decision-making. It stores
historical data and helps organizations study trends and patterns over time.
Example: An e-commerce company storing customer orders, payments, and delivery
details from all regions in one system for yearly sales analysis.

2. List two key characteristics of a data warehouse.

1. Subject-Oriented: Data is organized around key subjects like sales, marketing, or


finance instead of daily transactions.
2. Non-Volatile: Data in the warehouse does not change frequently; it remains stable
for analysis.
Example: A supermarket’s data warehouse might focus only on sales and stock
data, and once sales data is stored, it is not altered.

3. Differentiate between OLTP and OLAP.

 OLTP (Online Transaction Processing): Manages day-to-day operations, stores


current data, and is designed for fast insert, update, and delete operations.
 OLAP (Online Analytical Processing): Designed for analysis of large historical
data, complex queries, and decision support.
Example: OLTP – Booking a train ticket. OLAP – Analyzing last year’s passenger
booking trends.

4. State the purpose of a data warehouse.


The purpose of a data warehouse is to provide a single, reliable source of historical data
for analysis and reporting, helping organizations make strategic decisions. It improves
data quality, speeds up queries, and supports business intelligence tools.
Example: A hospital uses its data warehouse to find which treatments were most
successful in the past five years.

5. Name any two components of a data warehouse.

 ETL Tools: Extract, Transform, and Load data into the warehouse.
 Metadata: Information about the stored data, such as source, format, and
meaning.
Example: ETL software like Informatica; Metadata showing “Customer_Age –
Integer format.”
6. Give the role of ETL in data warehousing.
ETL gathers data from different sources, cleans and formats it, and stores it in the data
warehouse for easy analysis.
Example: Sales data from different branches is extracted, corrected for errors, and loaded
into one warehouse database.

7. What is the function of metadata in a data warehouse?


Metadata describes the contents, structure, and meaning of data stored in the warehouse.
It acts like a “data dictionary” that tells where the data came from, what it means, and
how it is formatted.
Example: Metadata can explain that “Sale_Date” means the date when a transaction
occurred, stored in the DD-MM-YYYY format.

8. State the difference between a data mart and a data warehouse.

 Data Mart: Smaller storage system, focuses on one department’s needs, such as
sales or marketing.
 Data Warehouse: Larger system that stores organization-wide data for all
departments.
Example: Sales Data Mart only stores sales data; Data Warehouse stores sales,
HR, finance, and other data together.

9. List two differences between operational databases and data warehouses.

1. Purpose: Operational DB supports daily operations; Data Warehouse supports


decision-making.
2. Data Type: Operational DB has current, real-time data; Data Warehouse stores
historical, integrated data.
Example: Bank transaction system (Operational DB) vs. Bank’s 5-year loan trend
analysis (Data Warehouse).

10. What is meant by non-volatile in the context of a data warehouse?


Non-volatile means that once data is entered into the data warehouse, it is not deleted or
modified often. This ensures consistent data for long-term analysis.
Example: A 2018 annual sales report in the warehouse will not change in 2025.

11. Mention one example each of operational database and data warehouse.

 Operational DB Example: Hospital’s patient registration system.


 Data Warehouse Example: Hospital’s 10-year patient treatment analysis system.
12. Compare data update frequency in operational DB vs data warehouse.

 Operational DB: Updated instantly with every transaction in real-time.


 Data Warehouse: Updated in batches periodically (daily, weekly, or monthly).
Example: An online store’s order database is updated the moment a customer buys
an item, but the warehouse is updated only once a night.

13. Name the three layers of data warehouse architecture.

1. Bottom Tier: Database server where data is stored.


2. Middle Tier: OLAP server that processes and manages data queries.
3. Top Tier: Front-end tools for reporting and analysis.
Example: Bottom – Oracle Database, Middle – OLAP Cube, Top – Power BI
Reports.

14. Role of the staging area in data warehousing


The staging area is a temporary place where data is kept before going into the main data
warehouse. Here, data from many sources is collected, cleaned, and changed into the
correct format. This makes sure only good quality data goes into the warehouse.
Example: Before adding online and shop sales data into the warehouse, errors like wrong
prices are fixed in the staging area.

15. Difference between centralized and distributed data warehouse architecture

 Centralized: All the organization’s data is stored in one single warehouse. It is


easier to manage but may be slower if too much data is stored.
 Distributed: Data is stored in different warehouses at different places, but they are
connected. It can be faster for local access but harder to manage.
Example: One big head office warehouse vs. small warehouses in each branch.

16. Define data warehouse architecture


Data warehouse architecture means the design or structure of a data warehouse that
shows how data will be collected, stored, processed, and shown to users. It includes
layers like storage, processing, and reporting tools.
Example: A system where the bottom layer stores data, the middle layer processes it, and
the top layer shows reports to managers.

17. Three tiers in a three-tier data warehouse architecture

1. Bottom Tier: Storage area for all collected data.


2. Middle Tier: Processes data using OLAP (Online Analytical Processing) tools.
3. Top Tier: Displays results to users using dashboards, graphs, and reports.
18. Function of the middle tier in three-tier architecture
The middle tier works as the brain of the warehouse. It takes stored data, processes it, and
makes it ready for analysis. It summarizes and organizes the data so that it can be easily
understood.
Example: Taking daily sales data and creating a monthly sales summary.

19. Tier that handles data presentation in three-tier architecture


The top tier is responsible for showing data to users. It uses reporting tools, graphs,
charts, and dashboards to make data easy to read and understand.
Example: A sales dashboard showing which products sold the most last month.

20. Difference between bottom and top tier of data warehouse architecture

 Bottom Tier: Stores all the raw and historical data in databases.
 Top Tier: Shows processed and analyzed data to users in a simple way.
Example: Bottom – storing all transaction records; Top – showing yearly sales
trends.

21. What is an autonomous data warehouse?


An autonomous data warehouse is a smart, cloud-based system that manages itself. It
uses AI and automation to do tasks like tuning, backups, and security without a person
having to do them manually.
Example: Oracle Autonomous Data Warehouse that takes care of maintenance
automatically.

22. Two benefits of using autonomous data warehouse

1. No Manual Work: It automatically updates, backs up, and keeps the system
healthy.
2. Flexible Scaling: Storage and processing can grow or shrink based on need.
Example: A shopping site can increase storage during a sale season without
downtime.

23. Compare ADW and Snowflake on scalability

 ADW: Can automatically grow storage and processing power, but within set
limits.
 Snowflake: Can grow almost without limits because storage and processing are
fully separate.
24. One difference between ADW and Snowflake in terms of architecture

 ADW: Based on Oracle’s database technology with AI automation built-in.


 Snowflake: Uses a special multi-cluster shared data design where storage and
computing are fully separate.

UNIT-2

1. Define ETL in data warehousing.


ETL stands for Extract, Transform, Load. It is a process in data warehousing where
data is taken from different sources, cleaned and changed into the correct format, and
then loaded into the data warehouse for analysis.
Example: Collecting sales data from branches, fixing errors, and storing it in a central
warehouse.

2. What is the role of ETL in data processing?


ETL helps in collecting raw data, improving its quality by removing errors, and
organizing it so that it is ready for analysis in the warehouse. Without ETL, data may be
incomplete, incorrect, or in different formats.
Example: Changing all dates into the same format (DD/MM/YYYY) before analysis.

3. What does ELT stand for in data integration?


ELT stands for Extract, Load, Transform. Data is first collected (extracted), then
directly loaded into the warehouse, and later transformed within the warehouse.

4. Distinguish between ETL and ELT with one key difference.

 ETL: Data is transformed before loading into the warehouse.


 ELT: Data is transformed after loading into the warehouse.
Example: ETL cleans data before storing; ELT stores raw data and cleans it later.

5. List any two types of data warehouses.

1. Enterprise Data Warehouse (EDW) – Stores data for the whole organization.
2. Data Mart – Stores data for a specific department or function.

6. Define the characteristics of a virtual data warehouse.


A virtual data warehouse doesn’t store data physically in one place. Instead, it provides a
view of data from multiple sources in real time, without moving the data.
Example: A system showing combined sales and inventory data from two different
databases instantly.
7. What is meant by star schema in data modeling?
Star schema is a way of organizing data in a warehouse where a central fact table is
connected to smaller dimension tables. The layout looks like a star.
Example: Sales fact table linked to product, customer, and time dimension tables.

8. Differentiate between star and snowflake schema in warehouse design.

 Star Schema: Dimension tables are simple and directly linked to the fact table.
 Snowflake Schema: Dimension tables are split into smaller related tables, making
them more complex.
Example: Star – “Product” table simple; Snowflake – “Product” table split into
“Product” and “Category.”

9. State what is meant by the delivery process in data warehousing.


The delivery process is how processed data from the warehouse is given to users in the
form of reports, dashboards, charts, or files.
Example: Monthly sales report sent to managers via email.

10. Summarize the steps involved in the delivery process of a data warehouse.

1. Prepare Data: Select and process data for reporting.


2. Format Data: Arrange it in tables, charts, or dashboards.
3. Deliver: Send or provide access to users through reports or BI tools.

11. Define OLAP.


OLAP stands for Online Analytical Processing. It is used for analyzing large amounts of
data quickly to help make business decisions.

12. Justify how OLAP supports decision-making.


OLAP allows users to view data from different angles, perform comparisons, and find
patterns. This helps in making informed decisions based on facts.
Example: Comparing last year’s and this year’s sales to decide marketing strategies.

13. Mention any two key characteristics of OLAP systems.

1. Multidimensional Analysis: Data can be viewed in many ways (by time, region,
product).
2. Fast Query Performance: Answers are given quickly even for large data.

14. Define the concept of multidimensional data in OLAP.


Multidimensional data means storing and viewing data in a cube-like structure with
multiple dimensions such as time, product, and location, allowing complex analysis.
Example: Checking sales of a product in a specific city during a particular month.
15. Write any one difference between OLTP and OLAP systems.
OLTP (Online Transaction Processing) systems are designed to handle a large number of
short, simple transactions such as insert, update, and delete operations. They store only
current data that is needed for day-to-day business activities.
OLAP (Online Analytical Processing) systems are designed for analysis and decision-
making. They store large volumes of historical data and allow complex queries for trends,
summaries, and comparisons.
Example: OLTP – A banking app recording deposits and withdrawals; OLAP –
Analyzing the last five years’ transaction patterns to detect fraud.

16. Compare OLTP and OLAP based on data storage structure.


In OLTP systems, data is stored in a normalized structure to reduce redundancy and
ensure fast transaction processing. This means data is split into multiple related tables.
In OLAP systems, data is usually stored in a denormalized structure, combining
information into fewer tables or multidimensional cubes, which allows faster reading and
analysis of data but requires more storage.

17. What is the roll-up operation in OLAP?


Roll-up is an OLAP operation that summarizes or aggregates data by climbing up the
hierarchy of a dimension. It moves from a more detailed level to a more general level of
data representation.
Example: In a sales database, you might roll up from daily sales → monthly sales →
yearly sales. This helps in identifying higher-level trends without focusing on small
details.

18. State the difference between drill-down and roll-up with an example.

 Drill-Down: Moves from summarized data to more detailed data, providing a


deeper view.
 Roll-Up: Moves from detailed data to more summarized data, providing a broader
view.
Example: If a company’s yearly sales report shows total sales of ₹50 crore, drill-
down can show monthly or daily sales figures. Roll-up can aggregate daily sales
into monthly, then yearly totals.

19. List the three major types of OLAP systems.

1. ROLAP (Relational OLAP): Uses relational databases to store data in tables.


2. MOLAP (Multidimensional OLAP): Stores data in multidimensional cubes for
fast retrieval.
3. HOLAP (Hybrid OLAP): Combines both ROLAP and MOLAP advantages.
20. What is the basic idea behind Hybrid OLAP (HOLAP)?
Hybrid OLAP (HOLAP) combines the large storage capacity of relational databases
(from ROLAP) with the fast query performance of multidimensional cubes (from
MOLAP). It stores detailed data in relational tables and summary data in cubes, enabling
both large data handling and quick analysis.

21. State one difference between ROLAP and MOLAP.

 ROLAP: Stores data in relational tables, is good for large datasets, but can be
slower for complex queries.
 MOLAP: Stores data in multidimensional cubes, is much faster for analysis, but
can require more storage and time for cube building.

22. Contrast ROLAP, MOLAP, and HOLAP in terms of data storage.

 ROLAP: All data is stored in relational databases using tables and SQL queries.
 MOLAP: All data is stored in multidimensional cubes for faster data retrieval and
analysis.
 HOLAP: Detailed data is stored in relational tables, while aggregated/summarized
data is stored in cubes for fast performance.

23. Identify any one OLAP operation and define it briefly.


Slice Operation: A slice operation selects a single value for one dimension, reducing the
cube’s dimensionality.
Example: Viewing sales for the year 2024 only, ignoring other years, to focus on that
specific data.

24. Interpret how a data warehouse supports OLAP functionality.


A data warehouse stores large amounts of historical, cleaned, and integrated data from
different sources. This organized data is the foundation for OLAP analysis. OLAP tools
use the warehouse to run complex queries, generate reports, perform trend analysis, and
support strategic decision-making. Without a data warehouse, OLAP would lack a
structured, reliable dataset for fast and accurate analysis.

You might also like