Unit - 3
Database Design and Normalization
Definition
Normalizing a database means tendency to simplify the
table for easy implementation in large database.
Normalizing a logical database design involves organizing
the data into more than one table.
Normalization improve the performance by reducing
redundancy in database table.
The basic objectives of normalization are to reduce
redundancy which means that information to be stored only
once in relation.
The Benefits of Normalization are:
Save the storage space and make easier to insert, delete & update
the data.
Fast sorting and index creation.
Simplify the structure of the table.
A properly Normalized Database should have the
following Characteristics :
Scalar values in each fields.
Absence of redundancy.
Minimum use of null values.
Minimum loss of information.
Levels of Normalization
Levels of Normalization based on the amount of redundancy in the
database.
Various levels of normalization are:
Number of Tables
Redundancy
First Normal Form (1NF)
Complexity
Second Normal Form (2NF)
Third Normal Form (3NF)
Boyce-Codd Normal Form (BCNF)
Fourth Normal Form (4NF)
Fifth Normal Form (5NF)
Domain Key Normal Form (DKNF)
Most
Mostdatabases
databasesshould
shouldbe
be3NF
3NFor
orBCNF
BCNFininorder
ordertotoavoid
avoidthe
theDatabase
DatabaseInconsistency
Inconsistency. .
Levels of Normalization
1NF
2NF
3NF
4NF
5NF
DKNF
Each
Eachhigher
higherlevel
levelisisaasubset
subsetof
ofthe
thelower
lowerlevel
level
Types of Normalization
First Normal Form (1 NF)
Each field contains the smallest meaningful value
The table does not contain repeating groups of
fields or repeating data within the same field
Solutions
Create a separate field/table for each set of related data.
Identify each set of related data with a primary key
Example (Not 1 NF)
PART WAREHOUSE QUANTITY
P0010 Warehouse A, Warehouse B, Warehouse C 400, 543, 329
P0020 Warehouse B, Warehouse C 200, 278
Really Bad Set-up!
Better, but still some disadvantage regarding Null Value
PART WAREHOUSE- A WAREHOUSE - B WAREHOUSE - C
(Primary Key) Qty Qty Qty
P0010 400 543 329
P0020 Null 200 278
1 NF – Decomposition
Example (1 NF)
PART WAREHOUSE QUANTITY
P0010 Warehouse A 400
P0010 Warehouse B 543
P0010 Warehouse C 329
P0020 Warehouse B 200
P0020 Warehouse D 278
Data Dependency
Redundant (Unnecessary) data occur often when integration of multiple
databases
The same attribute or object may have different names in different
databases.
Redundant Attributes may be able to be detected by Correlation Analysis
Careful integration of the data from multiple sources may help reduce/avoid
redundancies and inconsistencies and improve mining speed and quality
9
Correlation Analysis
Correlation Analysis involves various method and techniques used
for studying and measuring the level of the relationship between
two variables. Two variables are said to be correlated if the change
in one variable result in a corresponding change in the other
variable.
10
Correlation Analysis
Karl Pearson Coefficient of Correlation
Karl Pearson’s measures, known as Pearsonian correlation coefficient between two
variables (series) X and Y, usually denoted by r (X, Y) or fxy simply r is a numerical
measure of linear relationship between them and is defined as the ratio of the
covariance between X and Y, written as Cov (x, y), to the product of the standard
deviations of X and Y.
r > 0: X and Y are positively correlated.
r < 0: negatively correlated.
r = 0: No correlated.
11
Correlation Analysis
Interpretation of r
The value of r always lies between +1 and -1. When r = +1, it
indicates perfect positive correlation and r = -1 signified perfect
negative correlation. When r is near to zero it mean that there is
little/no correlation between X and Y.
r > 0: X and Y are positively correlated.
r < 0: negatively correlated.
r = 0: No correlated.
12
Case Study: The following data relate to age of 10 M/c operators and the number of
days on which they reported sick in a month:
Age (X) :28 32 38 42 46 52 54 57 58 63
Sick Days (Y) :0 1 3 4 2 5 4 6 7 8
Calculate Karl Pearson's Coefficient of correlation and interpret its r
Mean(X) = 470/10 = 47 ; Mean (Y) =40/10 = 4
13
Second Normal Form (2 NF)
Second normal form is based on concept of full functional
dependency . Let us first consider the functional dependency
A functional dependency (FD) is a kind of integrity constraints
that generalizes the concepts of key.
Let X and Y are two attributes of a relation. Given the value of
X, if there is only one value of Y corresponding to it, then Y is
said to be functionally dependent on X.
This is indicated by the notation X Y (Full FD)
Functional Dependencies
If one set of attributes in a table determines another set of attributes in the table, then the
second set of attributes is said to be functionally dependent on the first set of attributes.
Example 1
ISBN Title Price Table Scheme: {ISBN, Title, Price}
0-321-32132-1 Balloon $34.00 Functional Dependencies: {ISBN} {Title}
0-55-123456-9 Main Street $22.95
{ISBN}
{Price} {Title,
0-123-45678-0 Ulysses $35.00
Price} {ISBN}
1-22-233700-0 Visual $25.00
Basic
Functional Dependencies
Example 2
Table Scheme: {AuID, AuName, AuPhone}
Functional Dependencies: {AuId} {AuPhone}
{AuId} {AuName}
{AuName, AuPhone} {AuID}
AuID AuName AuPhone
1 Sleepy 321-321-1111
2 Snoopy 232-234-1234
3 Grumpy 665-235-6532
4 Jones 123-333-3333
5 Smith 654-223-3455
6 Joyce 666-666-6666
7 Roman 444-444-4444
Second Normal Form (2NF)
For a table to be in 2NF, there are two requirements:
The database is in 1st Normal Form
All number of attributes in the table must be functionally
dependent on the key attribute (Composite Primary Key)
Second Normal Form (2NF)
Example 1: Consider the Non-Normalized table book_order
Table : book_order
Order_N0 Title Qty Unit_Price
1 Computer Networks 1 250
1 Graphics 1 275
1 DBMS 2 295
2 Multimedia 1 300
2 Data Structure 1 190
3 DBMS 1 295
3 Multimedia 2 300
3 Computer Networks 5 250
The combination of order_no and title is the composite primary key since both order_no and title can not
repeat in the table.
This table is in 1NF but not 2 NF because unit_price is not functionally dependent on order_no and title of
the connected primary key.
Qty on other hand is functional dependent on connected composite primary key
Second Normal Form (2NF) Contd..
To convert this relation in 2NF following two steps is performed:
Find and remove attributes that are functionally dependent on only a part of the key and not on
the whole key, and place them is deferent table. And Group the remaining attributes.
In The above example, since unit_price is not functionally dependent on whole of the key
Order_no + Title. We may unit_price along with title table called Book_Master.
Order_Master
Order_N0 Title Qty
1
Project
Computer Networks 1
Book_Master 1 Graphics 1
Title Unit_Price 1 DBMS 2
Computer Networks 250 2 Multimedia 1
Graphics 275 2 Data Structure 1
DBMS 295 3 DBMS 1
Multimedia 300 3 Multimedia 2
Data Structure 190 3 Computer Networks 5
Second Normal Form (2NF)
AKTU- 2016-2017
Ex. 1:Consider the universal relation schema R (A,B,C,D,E,F,G,H,I,J) and set of following FD. F={ABC,
ADE, BF, FGH, DIJ} determine the keys for R and decompose R into 2 nd NF.
Sol. : (AB)+ = ABC because ABC
=ABCDE because ADE
=ABCDE F because BF
=ABCDE FGH because FGH
=ABCDE FGH IJ because DIJ
Hence AB is the key of R. The given relation R has composite primary key of {AB} and
non prime attribute are {C,D,E,F,G,H,I,J}.
In this case, FD are ABC, ADE, BF which is only part of the primary key.
Therefore this relation does not satisfy [Link] bring this relation to 2NF, we break the
table into three relation are: R1 (A,B,C), R2 (A,D,E,I,J) and R3 (B,F,G,H)
Third Normal Form (3NF)
For a table to be in 3NF, there are two requirements:
The table should be second normal form
No attribute is transitively dependent on the primary key
A function dependency X Y in a relation schema R is transitively
dependency that is neither a candidate key nor a subset of any key of R and
both X Z and Z Y. (Partial FD)
Example 1: Consider the non-3NF Table
Table : Course_Room
Course_Name Head_Dept Room_No Room_Capacity
X1 X2 X3 X4
[Link] (CS) Prof. Gupta 102 60
[Link] (IT) Prof. Smith 107 50
[Link] (EC) Dr. Sharma 105 60
[Link] (AI Mr. Sharma 103 100
MCA Mr. Jindal 111 40
In the above relation, room_capacity is functional dependent on room_no and room_no is also functional
dependent on Course_name.
So a transitive functional dependency exist here i.e., X4 X3 and X3 X1.
Room _Capacity (x4) is NOT transitive functionally dependent on Course_Name (x1).
Hence table Course_Room is not in 3 NF.
Example 1: Consider the non-3NF table
To convert the above table into 3 NF, we must remove the column Room_Capacity , since it is not
functional dependent on primary key Course_Name and place in the another table called Room along
with the attribute Room_No it is functionally dependent on.
Course Room_No
Table Course_Name Head_Dept
(Primary Key)
X1 X2
X3
[Link] (CS) Prof. Gupta 102
[Link] (IT) Prof. Smith 107
[Link] (EC) Dr. Sharma 105
[Link] (AI Mr. Sharma 103
MCA Mr. Jindal 111
Room_No Room_Capacity
Room
X3 X4
Table
102 60
107 50
105 60
103 100
111 40
Example 2: Consider the non-3NF table
EMPLOYEE_DEPARTMENT TABLE
EMPNO FIRSTNAME LASTNAME DEPT ID DEPTNAME
000290 John Parker E11 Operations
000320 Ramlal Mehta E21 Software Support
000310 Maude Setright E11 Operations
Example 2 : Consider the non-3NF table
EMPLOYEE TABLE
EMPNO (Primary Key) FIRSTNAME LASTNAME DEPT ID
000290 John Parker E11
000320 Ramlal Mehta E21
000310 Maude Setright E11
DEPARTMENT TABLE
DEPT ID (Primary Key) DEPTNAME
E11 Operations
E21 Software Support
Third Normal Form (3NF)
AKTU- 2013-2014
Ex. 1:Consider the universal relation schema R (A,B,C,D,E, F,G,H,I,J) and set of
following FD. F={ABC, AD, BF, FGH, DIJ} determine the keys
for R and decompose R into 2nd and 3rd NF.
Sol. : (ABE)+ = ABEC because ABC
=ABECD because AD
=ABECDF because BF
=ABECDFGH because FGH
=ABECDFGHIJ because DIJ
Hence ABE is the key of R. The given relation R has composite primary key of {ABE}
and non prime attribute are {C,D,F,G,H,I,J}.
In this case, FD are ABC, AD, BF which is only part of the primary key.
Therefore this relation does not satisfy 2NF. To bring this relation to 2NF, we break the
table into three relation are: R1 (A,B,C,E), R2 (A,D,E,I,J) and R3 (B,F,E,G,H).
Now in R3,BF and FGH then BGH that is transitivity properties exit. Therefore R3
is not in 3NF. To bring R3 in 3NF, we break the table into two relation are:
R3 (B,F,E,G,H) into R4 (B,E,F) and R3 (F,G,H).
Boyce-Codd Normal Form (BCNF)
BCNF is an extended form of 3NF.
If a relation is BCNF then it must be in 3 NF.
In BCNF, we extend our concept up to all the candidate keys of the relation, which are linked
and two or more of the candidate share a common attribute.
In BCNF, a table must only have candidate key as determinants.
Third normal form and BCNF are not same if the following conditions are true:
The table has two or more candidate keys
At least two of the candidate keys are composed of more than one attribute
The keys are not disjoint i.e. The composite candidate keys share some attributes
Example 1: Consider the non-BCNF Table
Table : Student
Stud_ID S_Name Subject Grade
1908020109005 Vikas Kr. Mishra DBMS A
1908020109005 Vikas Kr. Mishra DAA B
1908020109005 Vikas Kr. Mishra CD B
1880210019 Rishabh Dube DBMS A
1880210019 Rishabh Dube DAA A
1880210019 Rishabh Dube CD B
In this relation following FD Exist:
(S_Name, Subject Grade), (Stud_ID, Subject Grade), (S_Name Stud_ID) and (Stud_ID
S_Name)
In this relation two candidate keys (S_Name, Subject) and (Stud_ID, Subject) exist, which are composite
keys and contain a common attribute subject.
This relation is in 3NF, however a lot of data repetition is there, the field of S_Name and Stud_ID.
Example 1: Consider the non-BCNF table
To convert this relation in BCNF following two steps is performed:
Find the remove the overlapping candidate keys. Place the part of the candidate key the attribute it is
•
FD on, in different table.
• Group the remaining items into a table.
Grade Stud_ID Subject Grade
Table
1908020109005 DBMS A
1908020109005 DAA B
1908020109005 CD B
1880210019 DBMS A
1880210019 DAA A
1880210019 CD B
Student_ID Stud_ID Name
Table 1908020109005 Vikas Kr. Mishra
1880210019 Rishabh Dube
Multi-Valued Dependency
1. MVD occurs when two or more independent multi-values facts about the same
attribute occur within the same relation. Generally it is denoted by X Y i.e.,
there is a multi-valued dependency of Y on X.
2. Let R be a relation schema and let X and Y be the subsets of attributes of R.
Ex. Relation with MVD
Faculty Subject Committee
Dr. Sharma DBMS Placement
Dr. Sharma OS Placement
Dr. Sharma Data Mining Placement
Dr. Sharma DBMS Discipline
Dr. Sharma OS Discipline
Dr. Sharma Data Mining Discipline
Fourth Normal Form (4NF)
Fourth normal form eliminates independent many-to-one relationships between columns.
To be in Fourth Normal Form,
A relation must first be in BCNF.
A given relation may not contain more than one multi-valued attribute.
To convert this relation in 4 NF following two steps is performed:
1. Move the two multi-valued relations to separate tables
2. Identify a primary key for each of the new entity.
Faculty_Course Faculty_Committee
Faculty Subject Faculty Committee
Dr. Sharma DBMS Dr. Sharma Placement
Dr. Sharma OS Dr. Sharma Discipline
Dr. Sharma Data Mining
Fifth Normal Form (5NF)
Fifth normal form is satisfied when all tables are broken into as many tables
as possible in order to avoid redundancy. Once it is in fifth normal form it
cannot be broken into smaller relations without changing the facts or the
meaning.
In 5th normal form, we use the concept of Join Dependency which is
generalized form of Multi-value dependency.
A Join Dependency (JD) denoted by (R1 , R2 , R3 ) specified on relation
schema R, Specifies a constraints on the state r of R. The constraint state
that every legal state r of R should have a lossless join decomposition into
R1 , R2 , …….Rn .
MVD is a special case of a JD where n=2, i.e., JD denoted as (R 1 , R2 ).
Example 1: Consider the non-5th table
Company Product Supplier
Godrej Soap Mr. X
Godrej Shampoo Mr. X
Godrej Shampoo Mr. Y
Godrej Shampoo Mr. Z
[Link] Soap Mr. X
[Link] Soap Mr. Y
[Link] Shampoo Mr. Y
In the decompose tables, Mr. X is a supplier for Godrel for twice and Mr. Y is also
for twice for H. Lever. But if we decompose the table then we will loose
information, which can be shown as follows:
Company Supplier
Company_Suppliers
Company_Product Company Product (R2 ) Godrej Mr. X
(R1 ) Godrej Soap Godrej Mr. Y
Godrej Shampoo Godrej Mr. Z
[Link] Soap [Link] Mr. X
[Link] Shampoo [Link] Mr. Y
Example 1: Consider the non-5th table
If we want to display the products and their supplies, then we will have to use the join based
on the company attribute.
The result will display some spurious records. For Mr. Z, it will display both the products, soap
and shampoo as the company for which Mr. Z is the supplier (Godrej) is producing soap and
shampoo, which is correct.
Now suppose that original tables were to be decomposed in three parts, which is as shown.
Company_Product Company_Suppliers Product_Suppliers
(R1 ) (R2 ) (R3 )
Company Supplier Product Supplier
Company Product
Godrej Mr. X Soap Mr. X
Godrej Soap
Godrej Mr. Y Soap Mr. Y
Godrej Shampoo
Godrej Mr. Z Shampoo Mr. X
[Link] Soap
[Link] Mr. X Shampoo Mr. Y
[Link] Shampoo
[Link] Mr. Y Shampoo Mr. Z
Domain Key Normal Form (DKNF)
The relation is in DKNF when there can be no insertion or
deletion anomalies in the database.
A Key uniquely identifies each row in a table.
Decomposition – Loss of Information
1. If decomposition does not cause any loss of information it is called a
lossless decomposition.
2. If a decomposition does not cause any dependencies to be lost it is
called a dependency-preserving decomposition.
3. Any table scheme can be decomposed in a lossless way into a
collection of smaller schemas that are in BCNF form. However the
dependency preservation is not guaranteed.
4. Any table can be decomposed in a lossless way into 3rd normal form
that also preserves the dependencies.
3NF may be better than BCNF in some cases
Use
Useyour
yourown
ownjudgment
judgmentwhen
whendecomposing
decomposingschemas
schemas