Database System Lect 07
Database System Lect 07
Database Systems
for BS (IT)
Lecture 7:
Normalization
Hasan Raza
Lecturer CS & IT
Normalization
Process for evaluating and correcting table structures to minimize
data redundancies
Reduces data anomalies
Series of stages called normal forms:
First normal form (1NF)
Second normal form (2NF)
Third normal form (3NF)
Fourth normal form(4NF)
Database Tables and Normalization (cont’d.)
Normalization (continued)
2NF is better than 1NF; 3NF is better than 2NF
For most business database design purposes, 3NF is as high as
needed in normalization
Denormalization produces a lower normal form
Highest level of normalization is not always most desirable
Increased performance but greater data redundancy
The Need for Normalization
Example: Company which manages building projects.
Building projects
Project number
Project name
Employees assigned
…
Employee
Employee number
Employee name
Job classification
The Need for Normalization
emp_id
emp_id emp_name
emp_name emp_address
emp_address emp_dept
emp_dept
166
166 Omar
Omar Islamabad
Islamabad D004
D004
The Normalization Process
Each table represents a single subject. For example, a course table will
contain only data that directly pertain to courses. Similarly, a student table
will contain only student data.
No data item will be unnecessarily stored in more than one table (in short,
tables have minimum controlled redundancy). The reason for this
requirement is to ensure that the data are updated in only one place.
All nonprime attributes in a table are dependent on the primary key—the
entire primary key and nothing but the primary key. The reason for this
requirement is to ensure that the data are uniquely identifiable by primary
key value.
Each table is void of insertion, update, or deletion anomalies. This is to
ensure the integrity and consistency of the data.
The Normalization Process (cont’d.)
Objective of normalization is to ensure that all tables are in at
least 3NF
Higher forms are not likely to be encountered in business
environment
Normalization works one relation at a time
Progressively breaks table into new set of relations based on
identified dependencies
The Normalization Process (cont’d.)
Partial dependency
Exists when there is a functional dependence in which the determinant is
only part of the primary key
Transitive dependency
dependency of one nonprime attribute on another nonprime
attribute.
Repeating group
Group of multiple entries of same type can exist for any single key
attribute occurrence
Relational table must not contain repeating groups
Normalizing table structure will reduce data redundancies
Normalization is three-step procedure
Conversion to First Normal Form (cont’d.)
For example, using the data shown in Table if you know that PROJ_NUM = 15 and
EMP_NUM = 103, the entries for the attributes PROJ_NAME, EMP_NAME, JOB_CLASS,
CHG_HOUR, and HOURS must be Evergreen, June E. Arbough, Elect. Engineer, $84.50,
and 23.8, respectively
Step 2 – Identify Primary key
PK PK
Conversion to 1NF
Step 3 - Identify All Dependencies
Depicts all dependencies found within given table structure
Helpful in getting bird’s-eye view of all relationships among table’s
attributes
PROJ_NUM, EMP_NUM PROJ_NAME, EMP_NAME, JOB_CLASS,
CHG_HOUR, HOURS
PROJ_NUM PROJ_NAME
EMP_NUM EMP_NAME, JOB_CLASS, CHG_HOUR
JOB_CLASS CHG_HOUR
Conversion to First Normal Form (cont’d.)
Dependency diagram:
Depicts all dependencies found within given table structure
Helpful in getting bird’s-eye view of all relationships among table’s
attributes
Makes it less likely that you will overlook an important dependency
1NF Summarized
All key attributes defined
No repeating groups in table
All attributes dependent on primary key
every column of your table should only contain
single values
33
(Example- Multiple value)For an airline
Flight Weekday
UA59 Mo
Flight Weekdays UA59 We
UA59 Mo We Fr UA59 Fr
UA73 Mo Tu We Th Fr UA73 Mo
UA73 We
… …
First Normal Form (1NF)
a) Insert Anomaly:
b) Delete Anomaly.
Deleting a student will unnecessarily delete course data.
c) Update Anomaly.
A course cant be updated independently.
Second Normal Form (2NF)
In 2NF
Contains no transitive dependencies
Improving the Design
Table structures should be cleaned up to eliminate initial partial and
transitive dependencies
Normalization cannot, by itself, be relied on to make good designs
it reduces data redundancy and builds controlled redundancy.
The higher the NF,
– the more entities one has,
– the more flexible the database will be,
– the more joins (and less efficiency) you have.
Improving the Design (cont’d.)
Issues to address, in order, to produce a good normalized set of
tables:
Evaluate PK Assignments
Evaluate Naming Conventions
Refine Attribute Atomicity
Identify New Attributes
Identify New Relationships
Refine Primary Keys as Required for Data Granularity
Maintain Historical Accuracy
Evaluate Using Derived Attributes
THE BOYCE-CODD NORMAL FORM
The Boyce-Codd Normal Form
59
60
Example: BCNF conversion
Decomposition into BCNF
Normalization and Database Design
ER diagram
Identify relevant entities, their attributes, and their relationships
Identify additional entities and attributes
Normalization procedures
Focus on characteristics of specific entities
Micro view of entities within ER diagram
69
Denormalization
Reference book
Jeffrey Hoffer, “Modern Database Management ” Design, Implementation,
Management, 10th Edition”
Thomas Connolly, “Database Systems: A Practical Approach to Design,
Implementation and Management (6th Ed.)” (chapter 13)
Elmasri, “Fundamentals of Database Systems: (7th Ed.)”