0% found this document useful (0 votes)
267 views6 pages

Data Warehousing Insights

This document compares Oracle and Netezza data warehouses. It notes that IBM will end support for Netezza in June 2019. It lists advantages of Oracle over Netezza including support for third party tools, foreign keys, triggers, programming languages, and operating systems. Netezza advantages over Oracle include support for MapReduce and partitioning methods. The document also provides definitions and comparisons of data warehouse concepts like dimension tables, fact tables, OLTP vs OLAP, and ROLAP vs MOLAP.

Uploaded by

Hirak
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
267 views6 pages

Data Warehousing Insights

This document compares Oracle and Netezza data warehouses. It notes that IBM will end support for Netezza in June 2019. It lists advantages of Oracle over Netezza including support for third party tools, foreign keys, triggers, programming languages, and operating systems. Netezza advantages over Oracle include support for MapReduce and partitioning methods. The document also provides definitions and comparisons of data warehouse concepts like dimension tables, fact tables, OLTP vs OLAP, and ROLAP vs MOLAP.

Uploaded by

Hirak
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Why oracle no Netezza?

IBM made it public that support will be ended by June, 2019

Oracle advantage over Netezza:


3rd parties:
1) Navicat for Oracle improves the efficiency and productivity of Oracle developers and administrators with a streamlined
working environment.
2) Dremio is like magic for Oracle accelerating your analytical queries up to 1,000x.
3) DBHawk: a web-based Oracle database management and self-service BI software.
4) SQL Parser: Instantly adding parsing, decoding, analysis and rewrite SQL processing capability to your products.

2. Foreign keys
3. Immediate Consistency
4. Triggers
5. Supported vast range programming languages
6. XML support
7. Server operating systems: all where netezza Linux only

Netezza over oracle:


1) MapReduce support
2) Partitioning methods: Sharding but in Oracle: horizontal partitioning

DATA WAREHOUSE INTERVIEW QUESTIONS:


Dimension table contain attributes of measurements stored in fact tables. This table consists of hierarchies, categories and
logic that can be used to traverse in nodes.
Fact Table: contains the measurement of business processes, and it contains foreign keys for the dimension tables.

Stages of Data warehousing: 4


Offline Operational Database
Offline Data Warehouse
Real Time Datawarehouse
Integrated Datawarehouse

Data Mining is set to be a process of analyzing the data in different dimensions or perspectives and summarizing into useful
information. Can be queried and retrieved the data from database in their own format.

OLTP: (On-Line Transaction Processing), and it is an application that modifies the data whenever it received and has large
number of simultaneous users. i.e ATM

OLAP: Online Analytical Processing and it is set to be a system which collects, manages, processes multi-dimensional data for
analysis and management purposes. i.e. financial reporting, forecasting

OLTP OLAP
Data is from original data source Data is from various data sources
Simple queries by users Complex queries by system
Normalized small database (3NF) De-normalized Large Database
Fundamental business tasks Multi-dimensional business tasks
Must maintain data integrity constraint data integrity is not affected as not frequently modified

ROLAP MOLAP
Relational Online Analytical Processing Multidimensional Online Analytical Processing
Data is stored and fetched from main data warehouse. Data is Stored and fetched from Proprietary database MDDBs.
Data is stored in the form of relational tables. Data is Stored in the large multidimensional array made of data cubes.
Large data volumes Limited summaries data is kept in MDDBs.
Uses Complex SQL queries MOLAP engine created a pre-calculated and prefabricated data cubes
for multidimensional data views, Sparse matrix technology is used to
manage data sparsely

ROLAP creates a multidimensional view of data dynamically. MOLAP already stores the static multidimensional view of data
in MDDBs.
Slow access faster access

Aggregate tables contain the existing warehouse data which has been grouped to certain level of dimensions. It is easy to
retrieve data from the aggregated tables than the original table which has more number of records. This table reduces the load
in the database server and increases the performance of the query.

A factless fact tables are the fact table which doesn’t contain numeric fact column in the fact table.

Non-Addictive facts are said to be facts that cannot be summed up for any of the dimensions present in the fact table. If
there are changes in the dimensions, same facts can be useful.

Conformed fact is a table which can be used across multiple data marts in combined with the multiple fact tables.

ODS: Operational Data Store it is a repository of real time operational data rather than long term trend data

Datamart is a specialized version of Data-warehousing and it contains a snapshot of operational data that helps the business
people to decide with the analysis of past trends and experiences. A data mart helps to emphasizes on easy access to relevant
information.

Datawarehouse is a place where the whole data is stored for analyzing, but OLAP is used for analyzing the data, managing
aggregations, information partitioning into minor level information.

SCD is defined as slowly changing dimensions and it applies to the cases where record changes over time.
SCD 1 – The new record replaces the original record
SCD 2 – A new record is added to the existing customer dimension table
SCD 3 – A original data is modified to include new data

Key Differences between View and Materialized View


1. Views are not stored physically on the disk. On the other hands, Materialized Views are stored on the disc.
2. View can be defined as a virtual table created as a result of the query expression. However, Materialized View is a physical
copy, picture or snapshot of the base table.
3. A view is always updated as the query creating View executes each time the View is used. On the other hands, Materialized
View is updated manually or by applying triggers to it.
4. Materialized View responds faster than View as the Materialized View is pre-computed.
5. Materialized View utilizes memory space as it stored on the disk whereas, View is just a display hence it do not require
memory space.

BUS schema consists of suite of confirmed dimension and standardized definition if there is a fact tables.

Star schema is nothing but a type of organizing the tables in such a way that result can be retrieved from the database quickly
in the data warehouse environment.

Snowflake schema: which has primary dimension table to which one or more dimensions can be joined, primary dimension
table is the only table that can be joined with the fact table.

Star schema does not use normalization whereas snowflake schema uses normalization to eliminate redundancy of data.

STAR SCHEMA SNOWFLAKE SCHEMA


Contains fact and dimension tables Contains sub-dimension tables including fact and dimension tables
Doesn't use normalization Uses normalization and denormalization
Data model: Top-down Bottom-up
Query complexity Low High
Foreign key join used Fewer Large in number
Space usage More Less
Time consumed in query execution Less More comparatively due to excessive use of join.

30. What is a core dimension?

Core dimension is nothing but a Dimension table which is used as dedicated for single fact table or datamart.

31. What is called data cleaning?

Name itself implies that it is a self explanatory term. Cleaning of Orphan records, Data breaching business rules, Inconsistent
data and missing information in a database.

32. What is Metadata?

Metadata is defined as data about the data. The metadata contains information like number of columns used, fix width and
limited width, ordering of fields and data types of the fields.

33. What are loops in Data-warehousing?

In data warehousing, loops are existing between the tables. If there is a loop between the tables, then the query generation will
take more time and it creates ambiguity. It is advised to avoid loop between the tables.

34. Whether Dimension table can have numeric value?

Yes, dimension table can have numeric value as they are the descriptive elements of our business.

35. What is the definition of Cube in Data-warehousing?

Cubes are logical representation of multidimensional data. The edge of the cube has the dimension members,and the body of
the cube contains the data values.

36. What is called Dimensional Modelling?

Dimensional Modeling is a concept which can be used by dataware house designers to build their own datawarehouse. This
model can be stored in two types of tables – Facts and Dimension table.

Fact table has facts and measurements of the business and dimension table contains the context of measurements.

37. What are the types of Dimensional Modeling?

There are three types of Dimensional Modeling and they are as follows:

Conceptual Modeling
Logical Modeling
Physical Modeling
38. What is surrogate key?

Surrogate key is nothing but a substitute for the natural primary key. It is set to be a unique identifier for each row that can be
used for the primary key to a table.

39. What is the difference between ER Modeling and Dimensional Modeling?

ER modeling will have logical and physical model but Dimensional modeling will have only Physical model.

ER Modeling is used for normalizing the OLTP database design whereas Dimensional Modeling is used for de-normalizing the
ROLAP and MOLAP design.

Differentiate between % ROWTYPE and TYPE RECORD.


% ROWTYPE is used when a query returns an entire row of a table or view.
TYPE RECORD, on the other hand, is used when a query returns column of different tables or views.
Eg. TYPE r_emp is RECORD (sno smp.smpno%type,sname smp sname %type)
e_rec smp %ROWTYPE
Cursor c1 is select smpno,dept from smp;
e_rec c1 %ROWTYPE

Cursor: Cursor is a named private area in SQL from which information can be accessed. They are required to process each row
individually for queries which return multiple rows.

Raise_application_error: It is a procedure of package DBMS_STANDARD that allows issuing of user_defined error messages
from database trigger or stored sub-program.

Explain two virtual tables available at the time of database trigger execution.
Table columns are referred as THEN.column_name and NOW.column_name.
For INSERT related triggers, NOW.column_name values are available only.
For DELETE related triggers, THEN.column_name values are available only.
For UPDATE related triggers, both Table columns are available.

What are the rules to be applied to NULLs whilst doing comparisons?


1) NULL is never TRUE or FALSE
2) NULL cannot be equal or unequal to other values
3) If a value in an expression is NULL, then the expression itself evaluates to NULL except for concatenation operator (||)

How is a process of PL SQL compiled?


Compilation process includes syntax check, bind and p-code generation processes.
Syntax checking checks the PL SQL codes for compilation errors. When all errors are corrected, a storage address is assigned to
the variables that hold data. It is called Binding. P-code is a list of instructions for the PL SQL engine. P-code is stored in the
database for named blocks and is used the next time it is executed.

Mutating table error:


It occurs when a trigger tries to update a row that it is currently using. It is fixed by using views or temporary tables, so
database selects one and updates the other.
Most likely cause of a mutating table error is the misuse of triggers. Here is a typical example:
You insert a row in table A
A trigger on table A (for each row) executes a query on table A, for example to compute a summary column
Oracle throws an ORA-04091: table A is mutating, trigger/function may not see it
This is an expected and normal behaviour; Oracle wants to protect you from yourself since Oracle guarantees:
(i) That each statement is atomic (will either fail or succeed completely)
(ii) That each statement sees a consistent view of the data
Most likely when you write this kind of trigger you would expect the query (2) to see the row inserted on (1), This would be in
contradiction with both points above since the update is not finished yet (there could be more rows to be inserted).
Oracle could return the result consistent with a point in time just before the beginning of the statement but from most of the
examples I have seen that try to implement this logic, people see a multi-row statement as a serie of successive steps and
expect the statement to see the changes made by the previous steps. Oracle cannot return the expected result and therefore
throws the error.
1. Changed the trigger to an after trigger.
2. Changed it from a row level trigger to a statement level trigger.
3. Convert to a Compound Trigger.
4. Modified the structure of the triggers to use a combination of row and statement level triggers.
5. Made the trigger autonomous with a commit in it.

SQLCODE and SQLERRM


SQLCODE returns the value of the number of error for the last encountered error whereas SQLERRM returns the message for
the last error.

When is a declare statement required?


DECLARE statement is used by PL SQL anonymous blocks such as with standalone, non-stored procedures. If it is used, it must
come first in a standalone file.

Sequences are used to generate sequence numbers without an overhead of locking. Its drawback is that the sequence number
is lost if the transaction is rolled back.

How would you reference column values BEFORE and AFTER you have inserted and deleted triggers?
Using the keyword "new.column name", the triggers can reference column values by new collection. By using the keyword
"old.column name", they can reference column vaues by old collection.

Differ between Anonymous blocks and sub-programs.


Anonymous blocks are unnamed blocks that are not stored anywhere whilst sub-programs are compiled and stored in database.
They are compiled at runtime.

DECODE vs CASE?
DECODE does not allow Decision making statements in its place.

An autonomous transaction is an independent transaction of the main or parent transaction. It is not nested if it is started by
another transaction. There are several situations to use autonomous transactions like event logging and auditing.

MERGE is used to combine multiple DML statements into one.


ERGE INTO employees e
USING (SELECT * FROM hr_records WHERE start_date > ADD_MONTHS(SYSDATE, -1)) h or hr_records h
ON (e.id = h.emp_id)
WHEN MATCHED THEN
UPDATE SET e.address = h.address
WHEN NOT MATCHED THEN
INSERT (id, address)
VALUES (h.emp_id, h.address);

Can 2 queries be executed simultaneously in a Distributed Database System?


Yes, they can be executed simultaneously. One query is always independent of the second query in a distributed database
system based on the 2 phase commit.

What is out parameter used for eventhough return statement can also be used in pl/sql?
Out parameters allows more than one value in the calling program

Spool command can print the output of SQL statements in a file.


spool/tmp/sql_outtxt
select smp_name, smp_id from smp where dept='accounts';
spool off;

Tracing Method:
DBMS_APPLICATION_INFO
DBMS_TRACE
DBMS_SESSION and DBMS_MONITOR
trcsess and tkproof utilitiese:

Data Warehouse:
It is electronic storage of a large amount of information by a business which is designed for query and analysis instead of
transaction processing. It is a process of transforming data into information and making it available to users in a timely manner
to make a difference.
An information system which stores historical and commutative data from single or multiple sources, it is designed to analyze,
report, integrate transaction data from different sources.
Decision Support System (DSS)
Executive Information System
Management Information System
Business Intelligence Solution
Analytic Application

Types:
1. Enterprise Data Warehouse:
A centralized warehouse, it provides decision support service across the enterprise. It offers a unified approach for organizing
and representing data. It also provides the ability to classify data according to the subject and give access according to those
divisions.
2. Operational Data Store:
Data store required when neither Data warehouse nor OLTP systems support organizations reporting needs. In ODS, Data
warehouse is refreshed in real time. Hence, it is widely preferred for routine activities like storing records of the Employees.
3. Data mart
A subset of the data warehouse, it specially designed for a particular line of business, such as sales, finance, sales or finance. In
an independent data mart, data can collect directly from sources.

4 components of Data Warehouses:


1) Load manager: front component, it performs extraction and load of data into the warehouse. These operations include
transformations to prepare the data for entering into the Data warehouse.
2) Warehouse Manager: performs management of data in the warehouse. It performs operations like analysis of data to
ensure consistency, creation of indexes and views, generation of denormalization and aggregations, transformation and merging
of source data and archiving and baking-up data.
3) Query Manager: backend component, it performs management of user queries. The operations are direct queries to the
appropriate tables for scheduling the execution of queries.
4) End-user access tools: This is categorized into 5 different groups
1. Data Reporting
2. Query Tools
3. Application development tools
4. EIS tools
5. OLAP tools and data mining tools

3-tier architecture
Bottom Tier: It is usually a relational database system. Data is cleansed, transformed, and loaded into this layer using back-
end tools.
Middle Tier: OLAP server which is implemented using either ROLAP or MOLAP model. For a user, this application tier presents
an abstracted view of the database. This layer also acts as a mediator between the end-user and the database.
Top-Tier: The top tier is a front-end client layer. Top tier is the tools and API that you connect and get data out from the data
warehouse. It could be Query tools, reporting tools, managed query tools, Analysis tools and Data mining tools.

You might also like