Faq'S On Datawarehouse: What Is Surrogate Key ? Where We Use It Expalin With Examples
Faq'S On Datawarehouse: What Is Surrogate Key ? Where We Use It Expalin With Examples
data elements with the same or similar meaning from different systems. Metadata synchronization joins these differing elements together in the data warehouse to allow for easier access. In context of a Datawarehouse metadata is meant the information about the data .This information is stored in the designer repository.
6. What is the main differnce between schema in RDBMS and schemas in DataWarehouse....?
RDBMS Schema * Used for OLTP systems * Traditional and old schema * Normalized * Difficult to understand and navigate * Cannot solve extract and complex problems * Poorly modelled DWH Schema * Used for OLAP systems * New generation schema * De Normalized * Easy to understand and navigate * Extract and complex problems can be easily solved * Very good model
In Dimensional Modeling, Data is stored in two kinds of tables: Fact Tables and Dimension tables. Fact Table contains fact data e.g. sales, revenue, profit etc..... Dimension table contains dimensional data such as Product Id, product name, product description etc..... Dimensional Modelling is a design concept used by many data warehouse desginers to build thier datawarehouse. In this design model all the data is stored in two types of tables - Facts table and Dimension table. Fact table contains the facts/measurements of the business and the dimension table contains the context of measuremnets ie, the dimensions on which the facts are calculated. Why is Data Modeling Important? Data modeling is probably the most labor intensive and time consuming part of the development process. Why bother especially if you are pressed for time? A common response by practitioners who write on the subject is that you should no more build a database without a model than you should build a house without blueprints. The goal of the data model is to make sure that the all data objects required by the database are completely and accurately represented. Because the data model uses easily understood notations and natural language , it can be reviewed and verified as correct by the end-users. The data model is also detailed enough to be used by the database developers to use as a "blueprint" for building the physical database. The information contained in the data model will be used to define the relational tables, primary and foreign keys, stored procedures, and triggers. A poorly designed database will require more time in the long-term. Without careful planning you may create a database that omits data required to create critical reports, produces results that are incorrect or inconsistent, and is unable to accommodate changes in the user's requirements.
9. What is VLDB
The perception of what constitutes a VLDB continues to grow. A one terabyte database would normally be considered to be a VLDB. Data base is too large to back up in a time frame then it's a VLDB
The Entity-Relationship (ER) model was originally proposed by Peter in 1976 [Chen76] as a way to unify the network and relational database views. Simply stated the ER model is a conceptual data model that views the real world as entities and relationships. A basic component of the model is the Entity-Relationship diagram which is used to visually represents data objects. Since Chen wrote his paper the model has been extended and today it is commonly used for database design For the database designer, the utility of the ER model is: it maps well to the relational model. The constructs used in the ER model can easily be transformed into relational tables. it is simple and easy to understand with a minimum of training. Therefore, the model can be used by the database designer to communicate the design to the end user. In addition, the model can be used as a design plan by the database developer to implement a data model in a specific database management software.
17. What is Normalization, First Normal Form, Second Normal Form , Third Normal Form
Normalization : The process of decomposing tables to eliminate data redundancy is called Normalization. 1N.F:- The table should caontain scalar or atomic values. 2 N.F:- Table should be in 1N.F + No partial functional dependencies 3 N.F :-Table should be in 2 N.F + No transitive dependencies ---2NF - table should be in 1NF + non-key should not dependent on subset of the key ({part, supplier}, sup address) 3NF - table should be in 2NF + non key should not dependent on another non-key ({part}, warehouse name, warehouse addr) {primary key} more... 4,5 NF - for multi-valued dependencies (essentially to describe many-to-many relations) ---- Normalization can be defined as segregating of table into two different tables, so as to avoid duplication of values. The normalization is a step by step process of removing redundancies and dependencies of attributes in data
structure The condition of data at completion of each step is described as a normal form. Needs for normalization : improves data base design. Ensures minimum redundancy of data. Reduces need to reorganize data when design is modified or enhanced. Removes anomalies for database activities. First normal form : A table is in first normal form when it contains no repeating groups. The repeating column or fields in an un normalized table are removed from the table and put in to tables of their own. Such a table becomes dependent on the parent table from which it is derived. The key to this table is called concatenated key, with the key of the parent table forming a part it. Second normal form: A table is in second normal form if all its non_key fields fully dependent on the whole key. This means that each field in a table ,must depend on the entire key. Those that do not depend upon the combination key, are moved to another table on whose key they depend on. Structures which do not contain combination keys are automatically in second normal form. Third normal form: A table is said to be in third normal form , if all the non key fields of the table are independent of all other non key fields of the same table.
21.
The data warehouse and the OLTP data base are both relational databases. However, the objectives of both these databases are different. The OLTP database records transactions in real time and aims to automate clerical data entry processes of a business entity. Addition, modification and deletion of data in the OLTP database is essential and the semantics of the application used in the front end impact on the organization of the data in the database. The data warehouse on the other hand does not cater to real time operational requirements of the enterprise. It is more a storehouse of current and historical data and may also contain data extracted from external data sources. Differences
OLTP database
Designed for analysis of business Designed for real time business measures by categories and operations. attributes Optimized for bulk loads and large, complex, unpredictable queries that access many rows per table. Loaded with consistent, valid data; requires no real time validation Supports few concurrent users relative to OLTP Optimized for a common set of transactions, usually adding or retrieving a single row at a time per table. Optimized for validation of incoming data during transactions; uses validation data tables. Supports thousands of concurrent users.
However, the data warehouse supports OLTP system by providing a place for the latter to offload data as it accumulates and by providing services which would otherwise degrade the performance of the database.
22. Question:
Answer:
What is the difference between Data Warehouse and Online Analytical Processing
Ralph Kimball the co-founder of the data warehousing concept has defined the data warehouse as a "a copy of transaction data specifically structured for query and analysis. Both definitions highlight specific features of the data warehouse. The former definition focuses on the structure and organization of the data and the latter focuses upon the usage of the data. However, a listing of the features of a data warehouse would necessarily include the aspects highlighted in both these definitions. Data warehouse and OLAP are terms which are often used interchangeably. Actually they refer to two different components of a decision support system. While data in a data warehouse is composed of the historical data of the organization stored for end user analysis, OLAP is a technology that enables a data warehouse to be used effectively for online analysis using complex analytical queries. The differences between OLAP and data warehouse is tabulated below for ease of understanding: Data Warehouse Data from different data sources is stored in a relational database for end use analysis Data from different data sources is stored in a relational database for end use analysis Data is organized in summarized, aggregated, subject oriented, non volatile patterns. Data is a data warehouse is consolidated, flexible collection of data Supports analysis of data but does not support online analysis of data. Online Analytical Processing
A tool to evaluate and analyze the data in the data warehouse using analytical queries. A tool which helps organize data in the data warehouse using multidimensional models of data aggregation and summarization. Supports the data analyst in real time and enables online analysis of data with speed and flexibility.
23. Question:
Answer:
Online Analytical Processing A tool to evaluate and analyze the data in the data warehouse using analytical queries. A tool which helps organize data in the data warehouse using multidimensional models of data aggregation and summarization. Supports the data analyst in real time and enables online analysis of data with speed and flexibility.