0% found this document useful (0 votes)
5 views75 pages

Database System Lect 07

This document covers the concept of normalization in database design, detailing its importance in minimizing data redundancy and anomalies. It explains various normal forms (1NF, 2NF, 3NF, BCNF, 4NF) and the process of transforming tables to achieve these forms. Additionally, it discusses the need for normalization through examples of data anomalies and the steps involved in converting tables to higher normal forms.

Uploaded by

Anmol Khan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views75 pages

Database System Lect 07

This document covers the concept of normalization in database design, detailing its importance in minimizing data redundancy and anomalies. It explains various normal forms (1NF, 2NF, 3NF, BCNF, 4NF) and the process of transforming tables to achieve these forms. Additionally, it discusses the need for normalization through examples of data anomalies and the steps involved in converting tables to higher normal forms.

Uploaded by

Anmol Khan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 75

IT - 304

Database Systems
for BS (IT)

Lecture 7:
Normalization

Hasan Raza
Lecturer CS & IT

Govt. Postgraduate College, Sheikhupura


Objectives

 In this chapter, students will learn:


 What normalization is and what role it plays in the database design
process
 About the normal forms 1NF, 2NF, 3NF, BCNF,
and 4NF
 How normal forms can be transformed from lower normal forms to
higher normal forms
 That normalization and ER modeling are used concurrently to
produce a good database design
 That some situations require denormalization to generate information
efficiently
Database Systems, 10th Edition 2
Database Tables and Normalization
 Table is basic building block in database design
 Table’s structure is of great interest
 Two cases:
 possible poor table structures in good database design
 Modify existing database with existing poor table structure
 Normalization can help recognize a poor table and
convert to good tables with good structure
Database Tables and Normalization

 Normalization
 Process for evaluating and correcting table structures to minimize
data redundancies
 Reduces data anomalies
 Series of stages called normal forms:
 First normal form (1NF)
 Second normal form (2NF)
 Third normal form (3NF)
 Fourth normal form(4NF)
Database Tables and Normalization (cont’d.)

 Normalization (continued)
 2NF is better than 1NF; 3NF is better than 2NF
 For most business database design purposes, 3NF is as high as
needed in normalization
 Denormalization produces a lower normal form
 Highest level of normalization is not always most desirable
 Increased performance but greater data redundancy
The Need for Normalization
 Example: Company which manages building projects.
 Building projects
 Project number
 Project name
 Employees assigned
 …
 Employee
 Employee number
 Employee name
 Job classification
The Need for Normalization

 The business rules are:

 Charges its clients by billing hours spent on each contract


 Hourly billing rate is dependent on employee’s position

 Periodically, report is generated that contains information


such as displayed in Table 6.1
Classic control-break report. A common type of report from a
database.
DESIRED OUTPUT
The Need for Normalization (cont’d.)
Figure Observation
 Structure of data set in Figure 6.1 does not handle data very
well
 Primary key – PROJ_NUM contains nulls
 Table entries invite data inconsistencies
 Table displays data redundancies which yield the following
anomalies
 Update
 Insertion
 Deletion
The Need for Normalization (cont’d.)
Figure Observation
 Report may yield different results depending on what data
anomaly has occurred
 Relational database environment is suited to help designer
avoid data integrity problems
The Need for Normalization (cont’d.)
 Insertion Anomaly
 It occurs when extra data beyond the desired data must be added to
the database. (New employee must be assigned project)
 Update Anomaly
 It occurs when it is necessary to change multiple rows to modify only
a single fact. (Modifying JOB_CLASS)
 Deletion Anomaly
 It occurs when deleting a row causes some unwanted deletions.(If
employee deleted, other vital data lost)
Data Anomalies: Example
Suppose a manufacturing company stores the employee details in a table
named employee that has four attributes: emp_id, emp_name, emp_address,
emp_dept for storing employee’s details. At some point of time the table looks
like this:
emp_id emp_name emp_address emp_dept

101 Ali Lahore D001

101 Ali Lahore D002

123 Usman Karachi D890

166 Omar Islamabad D900

166 Omar Islamabad D004


Update Anomaly:
 we have two rows for employee Ali as he belongs to two departments of the
company. If we want to update the address of Ali then we have to update the
same in two rows otherwise the data will become inconsistent. If somehow,
the correct address gets updated in one department but not in other then as
per the database, Ali would be having two different addresses, which is not
correct and would leads to inconsistent data.

emp_id emp_name emp_address emp_dept

101 Ali Lahore D001

101 Ali Lahore D002

123 Usman Karachi D890

166 Omar Islamabad D900

166 Omar Islamabad D004


Insert Anomaly:
 Suppose a new employee joins the company, who is under
training and currently not assigned to any department then we
would not be able to insert the data into the table if emp_dept
field doesn’t allow nulls.
emp_id emp_name emp_address emp_dept

101 Ali Lahore D001

101 Ali Lahore D002

123 Usman Karachi D890

166 Omar Islamabad D900

166 Omar Islamabad D004


Delete Anomaly:
 Suppose, if after some time the company closes the department D890 then
deleting the rows that are having emp_dept as D890 would also delete the
information of employee Usman since he is assigned only to this department

emp_id
emp_id emp_name
emp_name emp_address
emp_address emp_dept
emp_dept

101 Ali Lahore D001


101 Ali Lahore D001
101 Ali Lahore D002
101 Ali Lahore D002
123 Usman Karachi D890

166 Omar Islamabad D900


166 Omar Islamabad D900

166
166 Omar
Omar Islamabad
Islamabad D004
D004
The Normalization Process
 Each table represents a single subject. For example, a course table will
contain only data that directly pertain to courses. Similarly, a student table
will contain only student data.
 No data item will be unnecessarily stored in more than one table (in short,
tables have minimum controlled redundancy). The reason for this
requirement is to ensure that the data are updated in only one place.
 All nonprime attributes in a table are dependent on the primary key—the
entire primary key and nothing but the primary key. The reason for this
requirement is to ensure that the data are uniquely identifiable by primary
key value.
 Each table is void of insertion, update, or deletion anomalies. This is to
ensure the integrity and consistency of the data.
The Normalization Process (cont’d.)
 Objective of normalization is to ensure that all tables are in at
least 3NF
 Higher forms are not likely to be encountered in business
environment
 Normalization works one relation at a time
 Progressively breaks table into new set of relations based on
identified dependencies
The Normalization Process (cont’d.)
 Partial dependency
 Exists when there is a functional dependence in which the determinant is
only part of the primary key
 Transitive dependency
 dependency of one nonprime attribute on another nonprime
attribute.

Normalization starts by identifying the dependencies of a given relation and


progressively breaking up the relation (table) into a set of new relations (tables)
based on the identified dependencies.
Conversion to First Normal Form

 Repeating group
 Group of multiple entries of same type can exist for any single key
attribute occurrence
 Relational table must not contain repeating groups
 Normalizing table structure will reduce data redundancies
 Normalization is three-step procedure
Conversion to First Normal Form (cont’d.)

 Step 1: Eliminate the Repeating Groups


 Eliminate nulls: each repeating group attribute contains an
appropriate data value
 Step 2: Identify the Primary Key
 Must uniquely identify attribute value
 New key must be composed
 Step 3: Identify All Dependencies
 Dependencies are depicted with a diagram
Conversion to 1NF
 Step 1: Eliminate the Repeating Groups
 A Repeating group is group of multiple entries of same type existing
for any single key attribute occurrence
 Present data in tabular format, where each cell has single value
and there are no repeating groups
 Eliminate repeating groups, eliminate nulls by making sure that each
repeating group attribute contains an appropriate data value
Repeating groups must be eliminated
A repeating group derives its name from the fact that a group of
multiple entries of the same type can exist for any single key
attribute occurrence.
Step 1: Eliminate the Repeating Groups
Conversion to 1NF
 Step 2 - Identify the Primary Key
 Review Determination and attribute dependence.
 All attribute values in the occurrence are ‘determined’ by the Primary
Key. The Primary Key Must uniquely identify the attribute(s)
 Resulting Composite Key : PROJ_NUM and EMP_NUM

 For example, using the data shown in Table if you know that PROJ_NUM = 15 and
EMP_NUM = 103, the entries for the attributes PROJ_NAME, EMP_NAME, JOB_CLASS,
CHG_HOUR, and HOURS must be Evergreen, June E. Arbough, Elect. Engineer, $84.50,
and 23.8, respectively
Step 2 – Identify Primary key
PK PK
Conversion to 1NF
 Step 3 - Identify All Dependencies
 Depicts all dependencies found within given table structure
 Helpful in getting bird’s-eye view of all relationships among table’s
attributes
 PROJ_NUM, EMP_NUM  PROJ_NAME, EMP_NAME, JOB_CLASS,
CHG_HOUR, HOURS
 PROJ_NUM  PROJ_NAME
 EMP_NUM  EMP_NAME, JOB_CLASS, CHG_HOUR
 JOB_CLASS  CHG_HOUR
Conversion to First Normal Form (cont’d.)

 Dependency diagram:
 Depicts all dependencies found within given table structure
 Helpful in getting bird’s-eye view of all relationships among table’s
attributes
 Makes it less likely that you will overlook an important dependency
1NF Summarized
 All key attributes defined
 No repeating groups in table
 All attributes dependent on primary key
 every column of your table should only contain
single values

33
(Example- Multiple value)For an airline
Flight Weekday
UA59 Mo
Flight Weekdays UA59 We
UA59 Mo We Fr UA59 Fr
UA73 Mo Tu We Th Fr UA73 Mo
UA73 We
… …
First Normal Form (1NF)

 A relation is said to be in 1NF, if it contains no Repeating Group (RG).


 A RG is a collection of multi-valued attributes OR when there is more than
one field storing the same kind of information in a single table, there is a RG.
 To eliminate a RG, the value at the intersection of a row and column must be
atomic(having one value).
 If you developed a logical design by transforming ER diagram into relations,
there should not be any multivalued attributes remaining
 Consider the following relation:
Student (RegNo, Name, Program, C-Code, C-Title, C-Grade)
 This relation has a repeating group consisting of C-Code, C-Title, C-Grade and
therefore it has the insert, delete and update anomalies.
 Multiple values create problems in performing operations like select or join.
First Normal Form (1NF)
 The relation Student can be converted into 1NF using either of the following
methods:
a) Change the PK of the relation and define a composite key RegNo & C_Code.
We fill the blanks by duplicating the non-repeating data. This approach is
commonly referred to as Flattening the table.
b) Split the relation into 2 relations by placing the repeating data along with a
copy of the original key attribute(s) in a separate relation. The new relation
will always have concatenated key.
Student (RegNo, Name, Program)
Course (RegNo, C-Code, C-Title, C_Grade)

Example 2: STD(stId, stName, stAdr, prName, bkId,bkTitle, i-date)


SECOND NORMALIZATION FORM
Conversion to Second Normal Form
 Step 1: Make New Tables to Eliminate Partial Dependencies
 Write each key component on separate line, then write original
(composite) key on last line
 Each component will become key in new table
 Step 2: Reassign Corresponding Dependent Attributes
 Determine attributes that are dependent on other attributes
 At this point, most anomalies have been eliminated
Second Normal Form (2NF)

 A relation is in 2NF if:


 It is in 1NF

 Every non-key attribute is fully functionally dependent on the primary key

 A situation of Partial Functional Dependency arises when PK of a relation is


composite, and a non key attribute is functionally dependent on part (but not
all) of the PK.
 Referring to the Course relation:
Course (RegNo, C-Code, C-Title, C_Grade)
 The functional dependencies are:
C-Code -> C_Title (Partial FD)
RegNo, C_Code -> C_Grade (Full FD)
Second Normal Form (2NF)
 Since all the non key attributes are not fully functionally dependent on the
PK or there is partial functional dependency in the relation, therefore it is
not in 2NF.
 The Anomalies associated with the course relation are:

a) Insert Anomaly:

 A course instance can't be inserted without a student (RegNo)

b) Delete Anomaly.
 Deleting a student will unnecessarily delete course data.

c) Update Anomaly.
 A course cant be updated independently.
Second Normal Form (2NF)

 The relation Course


 Course (RegNo, C-Code, C-Title, C_Grade)
 can be converted into 2NF by decomposing it into the following
relations:
Course (C-Code,C-Title)
Result (RegNo, C-Code, C_Grade)
2NF Summarized
 In 1NF
 Includes no partial dependencies
 No attribute dependent on a portion of primary key
 Still possible to exhibit transitive dependency
 Attributes may be functionally dependent on nonkey
attributes
THIRD NORMALIZATION FORM
Conversion to Third Normal Form

 Step 1: Make New Tables to Eliminate Transitive


Dependencies
 For every transitive dependency, write its determinant as PK for new
table
 Determinant: any attribute whose value determines other values
within a row
Conversion to Third Normal Form (cont’d.)

 Step 2: Reassign Corresponding Dependent Attributes


 Identify attributes dependent on each determinant identified in Step 1
 Identify dependency
 Name table to reflect its contents and function
3NF Summarized

 In 2NF
 Contains no transitive dependencies
Improving the Design
 Table structures should be cleaned up to eliminate initial partial and
transitive dependencies
 Normalization cannot, by itself, be relied on to make good designs
 it reduces data redundancy and builds controlled redundancy.
 The higher the NF,
 – the more entities one has,
 – the more flexible the database will be,
 – the more joins (and less efficiency) you have.
Improving the Design (cont’d.)
 Issues to address, in order, to produce a good normalized set of
tables:
 Evaluate PK Assignments
 Evaluate Naming Conventions
 Refine Attribute Atomicity
 Identify New Attributes
 Identify New Relationships
 Refine Primary Keys as Required for Data Granularity
 Maintain Historical Accuracy
 Evaluate Using Derived Attributes
THE BOYCE-CODD NORMAL FORM
The Boyce-Codd Normal Form

 Every determinant in table is a candidate key


 Has same characteristics as primary key, but for some reason, not
chosen to be primary key
 When table contains only one candidate key, the 3NF and the
BCNF are equivalent
 BCNF can be violated only when table contains more than
one candidate key
The Boyce-Codd Normal Form (cont’d.)

 Most designers consider the BCNF as a special case of 3NF


 Table is in 3NF when it is in 2NF and there are no transitive
dependencies
 Table can be in 3NF and fail to meet BCNF
 No partial dependencies, nor does it contain transitive dependencies
 A nonkey attribute is the determinant of a key attribute
C(nonprime attribute) is
determinant. C is
Candidate key.

59
60
Example: BCNF conversion
Decomposition into BCNF
Normalization and Database Design

 Normalization should be part of the design process


 Make sure that proposed entities meet required normal form
before table structures are created
 Many real-world databases have been improperly designed or
burdened with anomalies
 You may be asked to redesign and modify existing databases
Normalization and Database Design (cont’d.)

 ER diagram
 Identify relevant entities, their attributes, and their relationships
 Identify additional entities and attributes
 Normalization procedures
 Focus on characteristics of specific entities
 Micro view of entities within ER diagram
69
Denormalization

 Creation of normalized relations is important database design


goal
 Processing requirements should also be a goal
 If tables are decomposed to conform to normalization
requirements:
 Number of database tables expands
Denormalization (cont’d.)

 Joining the larger number of tables reduces system speed


 Conflicts are often resolved through compromises that may
include denormalization
 Defects of unnormalized tables:
 Data updates are less efficient because tables are larger
 Indexing is more cumbersome
 No simple strategies for creating virtual tables known as views
Summary

 Normalization minimizes data redundancies


 First three normal forms (1NF, 2NF, and 3NF) are most
encountered
 Table is in 1NF when:
 Atomicity of attributes is ensured
 A Primary key is defined
Summary (cont’d.)

 Table is in 2NF when it is in 1NF and contains no partial


dependencies
 Table is in 3NF when it is in 2NF and contains no transitive
dependencies
 Table that is not in 3NF may be split into new tables until all of
the tables meet 3NF requirements
 Normalization is important part—but only part—of the design
process
Summary (cont’d.)

 Tables are sometimes denormalized to yield less I/O, which


increases processing speed
Textbook
Carlos Coronel, Steve Morris, “Database Systems” Design, Implementation, Management, 12th Ed.
Course Technology, 2016”.(chapter 6)

Reference book
 Jeffrey Hoffer, “Modern Database Management ” Design, Implementation,
Management, 10th Edition”
 Thomas Connolly, “Database Systems: A Practical Approach to Design,
Implementation and Management (6th Ed.)” (chapter 13)
 Elmasri, “Fundamentals of Database Systems: (7th Ed.)”

You might also like