0% found this document useful (0 votes)

7 views110 pages

Normalization

The document discusses the importance of normalization in database design, focusing on eliminating data redundancy and related anomalies through various normal forms. It explains the goals of good database design, issues arising from data redundancy, and the processes of achieving First (1NF), Second (2NF), and Third Normal Forms (3NF). The document provides examples and definitions to illustrate how to identify and resolve design issues in relational databases.

Uploaded by

swatiiii2003

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views110 pages

Normalization

Uploaded by

swatiiii2003

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 110

Quality Content for Outcome based Learning

Normalization
Unit-5

Ver. No.: 1.1 Copyright © 2021, ABES Engineering College

Introduction We
know

 ER model helps the database designer identify entity types, their

attributes and relationship between entity types
 This leads to a natural and logical grouping of attributes into relations
 Each relation schema consists of several attributes
 A relational database schema consists of several relation schemas

We need
to know

• Some formal way of analysing why one grouping of attributes into a

relation schema may be better than another
• Measure of appropriateness or goodness to measure the quality of the
design, other than the designer's intuition

Copyright © 2021, ABES Engineering College

Good Quality Database Design Goals

The implicit goals of the design activity are information preservation

and redundancy minimization

 Information preservation: maintaining all concepts, including entity types, attribute

types, relationship types, and generalization/specialization relationships, which are
described using the ER model
 Redundancy minimization: minimizing redundancy implies minimizing redundant
storage of the same data and reducing the need for multiple updates to maintain
consistency across multiple copies of the same information in response to real-world
events requiring an update

Copyright © 2021, ABES Engineering College

Data redundancy and associated issues

 Data redundancy occurs when the same piece of data is stored in two or
more separate places.
 Suppose we create a Relation to store sales records, and in the records for
each sale, we enter the customer address as one of the attributes. Now we
have multiple sales to the same customer, so the same address is entered
multiple times. The address that is repeatedly entered is redundant data.
 Data redundancy normally happens when we try to combine attributes from
multiple entity types and relationship types into a single Relation

Copyright © 2021, ABES Engineering College

Relation schema design issues

To understand design issues and the problems associated with it, let's take an example of Relation
FACULTY_DETAIL that stores all the faculty attributes and the department they work for. The
department data is not stored separately.

In the above Relation FACULTY_DETAIL, we have redundant data in the column – dept_location. For
each faculty, while specifying its department, dept_location information needs to be repeated. This
Relation suffers from insertion, updation, and deletion anomalies.

Copyright © 2021, ABES Engineering College

Relation schema design issues…

Insertion Anomaly:- College starts a new department (CSE-DS at Bhabha Block) that is yet to have any faculty.
We cannot store the data of this new department in the above Relation and the faculty_Id being a primary key
cannot be NULL for a record/tuple.
Updation Anomaly:- Suppose the location of a department is changed. The new location needs to be updated for
this particular department in all the rows/tuples where it appears. While carrying this updation process, if we miss
any row/tuple where this department appears, this department's data will be inconsistent in the Relation.
Deletion Anomaly:- If faculty_Id - 2765 (Girish) leaves the college and his record is deleted from the database.
We can see that he is the only faculty in the ME department. The moment we delete faculty_Id - 2765 record/tuple
from the Relation, ME department information is also lost.

Copyright © 2021, ABES Engineering College

Relation schema design issues…

If we decompose the above FACULTY_DETAIL Relation into two separate relations, say faculty and department, we
will eliminate the design issues and related anomalies discussed earlier.

This process of eliminating the relation design issues and

related anomalies is called Normalization
Copyright © 2021, ABES Engineering College
Normalization

 As discussed, the process of eliminating the relation design issues

(mainly data redundancy) and related anomalies is called Normalization
 When we convert the ER model into a relational model, in most cases,
substantial normalization is already achieved by virtue of implicit and
explicit constraints discussed in earlier units. However, we will discuss all
the normal forms in detail to understand the normalization process.

Copyright © 2021, ABES Engineering College

First Normal Form (1NF)

The first normal form (1NF), imposes a fundamental requirement on

relations.
 We say that a relation schema R is in first normal form (1NF) if the domains of all
attributes of R are atomic.
 A domain of an attribute is atomic if elements of the domain are considered to be
indivisible units.
 It means that multivalued attributes, composite attributes, and their combinations
are not allowed in a Relation that is in first normal form

Copyright © 2021, ABES Engineering College

First Normal Form (1NF)…
 Multivalued attribute: A multivalued attribute may have one or more values for a particular entity.
Example – Phone Number. In our SMS case study, the phone number attribute in the STUDENT entity
type is a multivalued attribute. It means that a student can have multiple phone numbers. If you
remember, this also comes from the implicit constraint applied to relational databases.
 Composite attribute: Composite attributes are not atomic because they are assembled using some
other atomic attributes. A typical example of a composite attribute is a person's address, composed of
atomic attributes, such as House No., Street, City, State, Pincode.
 In the case of a composite attribute, we can still store it in the database without violating any database
constraint; however, it is not a good database design. Storing a composite attribute in the database will make
data querying and analysis on its constituent atomic attributes very complex. It can also result in the
redundancy of data.

Copyright © 2021, ABES Engineering College

First Normal Form (1NF)…

For handling a Composite attribute we need to create a separate column for each part of the
composite attribute, as number of parts in a composite attribute will be fixed for most of the cases.

Copyright © 2021, ABES Engineering College

First Normal Form (1NF)…
For handling a multivalued attribute, we have the three options:-
Option 1:
Expand the Key of this Relation to include phone_no with roll_no. The Relation will now
have a composite primary key consisting of roll_no & phone_no. This arrangement achieves
the first normal form (1NF); however, it is not a good design as it introduces data
redundancy into the Relation. For each additional phone number of a student, the data in
other columns is repeated.

Copyright © 2021, ABES Engineering College

First Normal Form (1NF)…
Option 2:
Suppose the maximum number of values is known for phone_no, as many columns can be
added to the existing Relation.
Let's assume a student can have a maximum of two phone_no. We can create the below
relation design, with two separate columns to store two possible student phone numbers to
achieve the first normal form (1NF). This is not a good design as it limits the phone numbers
a student can have. If we want to allow more phone numbers, the relation design would
need a change, which is not a good design practice.

Copyright © 2021, ABES Engineering College

First Normal Form (1NF)…
Option 3:
Decompose this Relation into two relations – STUDENT & STUDENT_PHONE_NO. They
are linked to each other with the Primary Key (PK) - Foreign Key (FK) relationship. This is a
good design as it takes care of data redundancy and does not limit the number of phone
numbers a student can have.

Copyright © 2021, ABES Engineering College

Second Normal Form (2NF)

 The Second Normal Form (2NF) is based on the concept of full functional
dependency.
 The Second Normal Form applies to relations with composite keys, that
is, relations with a primary key composed of two or more attributes.
 A Relation with a single-attribute primary key is automatically in at least
2NF. A Relation not in 2NF may suffer from inconsistency problems
arising during insert, delete and update operations.

Copyright © 2021, ABES Engineering College

Second Normal Form (2NF)…
Definition:
For a Relation to be in 2NF, it should fulfill the below two conditions:
 The Relation should be in 1NF
 The Relation should have No Partial Dependency, i.e., no non-prime attribute (attributes that are not part
of any Primary/candidate key) is dependent on any proper subset of any candidate key of the Relation.
How to check:
 2NF applies to relations with composite candidate keys. A Relation with a single-attribute candidate Keys is
automatically in at least 2NF.
 Proper Subset (CK/PK) → any non-prime attribute should not hold.
How to convert 1NF to 2NF:
The normalization of 1NF relations to 2NF involves the removal of partial functional dependencies. If a partial
dependency exists, we remove partially dependent attribute(s) (along with their dependents, if any) from the
Relation by placing them in a new Relation along with a copy of their determinant. The remaining attributes of
the Relation along with the determinant above remain part of the base Relation.

Copyright © 2021, ABES Engineering College

Second Normal Form (2NF)…
Example 1: Let's assume a school can store the data of teachers and the subjects they teach. In a
school, a teacher can teach more than one subject.
The FDs in the Relation teacher_id → teacher_age,
can be depicted as: Relation (ABC) with FD = A→C
Let's find the candidate key of the above Relation.
Candidate Key is (AB). Prime Attributes – A, B. Non-
prime Attributes – C
We have a composite candidate key (AB), and its
proper subset (A) can determine a non-prime
attribute (C), FD (A→C). So this is a case of partial
dependency. Therefore the Relation is not in 2NF.

To convert this Relation into 2NF, we need

to remove the partially dependent
attribute(s) from the Relation by placing
them in a new Relation along with a copy
of their determinant.

Copyright © 2021, ABES Engineering College

Second Normal Form (2NF)…
Example 2: In the previous section, when we converted the Relation into 1NF using option 1. (roll_no
& phone_no) is the composite primary key

Now let's analyze this Relation from a functional dependency point of view and find out if this is in 2NF
or not. We can re-write the above as Relation R(ABCDEFGHIJKL) with FDs = A→BCDEFGHIJK, I→J
Candidate Keys is (AL). Prime Attributes – A, L. Non-prime Attributes – B, C, D, E, F, G, H, I, J, K
We have a composite candidate key (AL), and its proper subset (A) can determine non-prime attributes
(B, C, D, E, F, G, H, I, J, K), FD (A→BCDEFGHIJK). So this is a case of partial dependency. Therefore
the Relation is not in 2NF.

Copyright © 2021, ABES Engineering College

Second Normal Form (2NF)…
Example 2 (contd.): To convert this Relation into 2NF, we need to remove the partially dependent
attribute(s) from the Relation by placing them in a new relation along with a copy of their determinant.

Copyright © 2021, ABES Engineering College

Second Normal Form (2NF)…
Example 3:
Let's take Relation R(A,B,C,D,E,F) with FD set = (A→B, B→C, C→D, D→E). Let's find if this Relation
is in 2NF or not.
The candidate key of the above Relation is (A). As the candidate key is not composite, the case of
partial dependency does not arise. Therefore the Relation is in 2NF.

Example 4:
Let’s take Relation R(A,B,C,D) with FD set = (AB→CD, C→A, D→B). Let's find if this Relation is in
2NF or not.
The candidate keys of the above Relation are (AB), (BC), (CD), (AD).
Prime Attributes – A, B, C, D. Non-prime Attributes – NILL
In this case, though, we have composite candidate keys but no non-prime attribute. So the case of
partial dependency does not arise. Therefore the Relation is in 2NF.

Copyright © 2021, ABES Engineering College

Second Normal Form (2NF)…

Example 5:
Let’s take Relation R(A,B,C,D) with FD set = (A→B, B→D). Let's find if this Relation is in 2NF or not.
The candidate key of the above Relation is (AC).
Prime Attributes – A, C
Non-prime Attributes – B, D
In this case, we have a composite candidate key (AC), and its proper subset (A) can determine a non-
prime attribute (B), FD (A→B). So this is a case of partial dependency. Therefore the Relation is not in
2NF.

Copyright © 2021, ABES Engineering College

Third Normal Form (3NF)

 Although Second Normal Form (2NF) relations have less redundancy than
those in 1NF, they may still suffer from inconsistency problems arising during
insert, delete and update operations.
 A transitive dependency causes these inconsistency problems. Transitive
dependency causes redundancy in the Relation. We need to remove
such dependencies by progressing to the Third Normal Form (3NF).

Copyright © 2021, ABES Engineering College

Third Normal Form (3NF)…

Definition:
For a Relation to be in 3NF, it should fulfill both the below two conditions
 The Relation should be in 2NF
 There should be no non-prime attribute that is transitively dependent on the primary key
or any candidate key
or
 A non-prime attribute should not functionally depend on the other non-prime attribute.

This means if we have a Relation R(A,B,C,D) with FDs = A→BD, B→C. In this Relation, (A) is
the candidate key and we have a transitive dependency, A→B, B→C.
We have a non-prime attribute (C) that is transitively dependent on candidate key (A), therefore
this Relation is not in 3NF or we can say, we have a non-prime attribute (C) which is dependent
on another non-prime attribute (B); hence the Relation is violating the 3NF condition.

Copyright © 2021, ABES Engineering College

Third Normal Form (3NF)…
How to check:-
A Relation is in 3NF if at least one of the following condition holds in every non-trivial
function dependency X→Y:
 X is a super key
 Y is a prime attribute
How to convert 2NF to 3NF:-
The normalization of 2NF relations to 3NF involves the removal of transitive dependencies.
If a transitive dependency exists, we remove transitively dependent attribute(s) from the
Relation by placing the attribute(s) in a new Relation along with a copy of the determinant.
The remaining attributes of the Relation along with the determinant above remain part of the
base Relation.

Copyright © 2021, ABES Engineering College

Third Normal Form (3NF)…
Example 1: In the previous section, in example 2, we converted the STUDENT Relation from 1NF to
2NF by decomposing it into two separate relations STUDENT_DETAIL and STUDENT_PHONE_NO.
Now let's analyze the STUDENT_DETAIL Relation, which is already in 2NF.

FDs in the above Relation are:

roll_no → first_name, middle_name, last_name, dob, gender, house_no, street_name, city, State,
pincode, city → state
The candidate key of the Relation is roll_no. In this Relation, we have a transitive dependency roll_no
→ city, city → state. This transitive dependency is causing data redundancy in the Relation. Therefore
this Relation is not in 3NF.

Copyright © 2021, ABES Engineering College

Third Normal Form (3NF)…
Example 1 (contd.): The normalization of this Relation to 3NF will involve the removal of transitive
dependencies. We need to remove the transitively dependent attribute(s) from the Relation by placing
the attribute(s) in a new Relation (CITY_STATE_MASTER) along with a copy of the determinant.

Copyright © 2021, ABES Engineering College

Third Normal Form (3NF)…
Example 2:
Let's take Relation R(A,B,C,D) with FD set = (A→B, B→C, C→D). Let's find if this Relation is in 3NF
or not.
The candidate key of the above Relation is (A).
Prime attributes – A. Non-prime attributes – B, C, D
Now let's analyze each FD for the 3NF condition:

A relation is in 3NF if at least one of the following condition holds in every non-trivial function dependency
X→Y:
• X is a super key
• Y is a prime attribute
A→B, A is a super key (we know all candidate keys are super keys) – 3NF condition met
B→C, B is not a super key, and C is not a prime attribute – 3NF condition failed
Therefore we can conclude that the above Relation is not 3NF.

Copyright © 2021, ABES Engineering College

Third Normal Form (3NF)…
Example 3:
Let’s take Relation R(A,B,C,D,E,F) with FD set = (AB→CDEF, BD→F). Let's find if this Relation is in
3NF or not.
The candidate key of the above Relation is (AB).
Prime attributes – A, B. Non-prime attributes – C, D, E, F
Now let's analyze each FD for the 3NF condition:

A relation is in 3NF if at least one of the following condition holds in every non-trivial function dependency
X→Y:
• X is a super key
• Y is a prime attribute

AB→CDEF, AB is a super key (we know all candidate keys are super keys) – 3NF condition met
BD→F, BD is not a super key, and F is not a prime attribute – 3NF condition failed
Therefore we can conclude that the above Relation is not 3NF.

Copyright © 2021, ABES Engineering College

Third Normal Form (3NF)…
Example 4:
Let's take Relation R(A,B,C,D,E) with FD set = (A→B, B→C, C→D, D→A). Let's find if this Relation is
in 3NF?
The candidate key of the above Relation is (AE), (DE), (CE), (BE).
Prime attributes – A, B, C, D, E. Non-prime attributes – NILL
Now let's analyze each FD for the 3NF condition:
A relation is in 3NF if at least one of the following condition holds in every non-trivial function
dependency X→Y:
• X is a super key
• Y is a prime attribute
A→B, A is not a super key, but B is a prime attribute – 3NF condition met.
B→C, B is not a super key, but C is a prime attribute – 3NF condition met.
C→D, C is not a super key, but D is a prime attribute – 3NF condition met.
D→A, D is not a super key, but A is a prime attribute – 3NF condition met.
Therefore we can conclude that the above Relation is in 3NF.

Copyright © 2021, ABES Engineering College

Boyce Codd Normal Form (BCNF)
Boyce-Codd Normal Form or BCNF is an extension to the 3NF and is also known as the 3.5
Normal Form. Some redundancies might still remain even after a Relation is in 3NF.
Definition:
For a Relation to be in BCNF, it should fulfill both the below two conditions
 The Relation should be in 3NF
 For each non-trivial functional dependency X→Y, X should be a Super Key
or
The Relation has no non-trivial functional dependency i.e. the Relation is an all-key Relation
(all attributes make the only candidate key)
How to convert 3NF to BCNF:
The normalization of 3NF relations to BCNF involves creating new Relation for every
dependency that violates the BCNF condition. The remaining attributes of the Relation, along
with the determinant (of the FD violating the BCNF condition) above, remain part of the base
Relation.

Copyright © 2021, ABES Engineering College

Boyce Codd Normal Form (BCNF)…
Example 1:
Relation R(A,B,C) with FD set = (A→B, B→C, C→A).
The candidate key of the above Relation is (A), (B), (C).
Prime attributes – A, B, C
Non-prime attributes – NILL
This Relation is in 3NF (use the concepts learned in the previous section). Now let's analyze
each FD for BCNF condition:
A→B, A is a super key – BCNF condition met.
B→C, B is a super key – BCNF condition met.
C→A, C is a super key – BCNF condition met.
All FDs are meeting the BCNF condition; therefore, we can conclude that the above Relation is
in BCNF.

Copyright © 2021, ABES Engineering College

Boyce Codd Normal Form (BCNF)…
Example 2:
Relation R(A,B,C) with FD set = (AB→C, C→B).
The candidate key of the above Relation is (AB), (AC).
Prime attributes – A, B, C
Non-prime attributes – NILL
This Relation is in 3NF (use the concepts learned in the previous section). Now let's analyze
each FD for BCNF condition:
AB→C, AB is a super key – BCNF condition met.
C→B, C is not a super key – BCNF condition not met.
All FDs are not meeting the BCNF condition; therefore, we can conclude that the above
Relation is not in BCNF.

Copyright © 2021, ABES Engineering College

Boyce Codd Normal Form (BCNF)…
Example 3: Below we have a STUDENT_SUBJECT_PROFESSOR Relation with columns student_id,
subject, and professor.

In the above Relation:

 One student can enroll in multiple subjects. For example, a student with student_id 101 has opted
for subjects - Java & C++
 For each subject, a professor is assigned to the student.
 There can be multiple professors teaching one subject as we have for Java.
 One professor teaches only one subject

Copyright © 2021, ABES Engineering College

Boyce Codd Normal Form (BCNF)…

Example 3 (contd.):
FDs for this Relation:
student_id, subject → professor
professor → subject
Candidate key for the Relation – (student_id, subject)
This Relation satisfies the 1st Normal form because all the values are atomic, column names
are unique, and all the values stored in a particular column are of the same domain.
This Relation also satisfies the 2nd Normal Form as there is no Partial Dependency.
And, there is no Transitive Dependency; hence the Relation also satisfies the 3rd Normal
Form.
But this Relation is not in Boyce-Codd Normal Form as FD; professor → subject does not
meet the BCNF condition. Here LHS (professor) is not a super key.

Copyright © 2021, ABES Engineering College

Boyce Codd Normal Form (BCNF)…

Example 3 (contd.):
To make this Relation satisfy BCNF, we will decompose this Relation into two relations
STUDENT_PROFESSOR and PROFESSOR_SUBJECT.

Copyright © 2021, ABES Engineering College

Finding the highest normal form of a relation
Steps to find the highest normal form of a Relation:
 Find all possible candidate keys of the Relation.
 Divide all attributes into two categories: prime attributes and non-prime attributes.
 Check for BCNF normal form, then 3NF, and so on. By definition (implicit constraints) a
Relation will always be in 1NF.
Summary of definition of Normal forms:
2NF: No non-prime attribute should be partially dependent on Candidate Key (CK).
i.e. Proper Subset (CK/PK) → any non-prime attribute should not hold.
3NF: First, it should be in 2NF and at least one of the following condition holds in every non-
trivial function dependency X→Y:
 X is a super key
 Y is a prime attribute
BCNF: First, it should be in 3NF and if there exists a non-trivial dependency between two sets of
attributes X and Y such that X→Y, then X is Super Key

Copyright © 2021, ABES Engineering College

Finding the highest normal form of a relation…
The below Venn diagram shows the relationship between various normal forms. If a
Relation is in BCNF, it is already in 3NF, 2NF & 1NF. That's why we start checking a
Relation for BCNF and then move to 3NF and so on.