lesson10 Normalization
lesson10 Normalization
Normalization of data can be defined as a process during which the existing tables of a database
are tested to find certain data dependency between the column and the rows or normalizing of
data can be referred to a formal technique of making preliminary data structures into an easy to
maintain and make efficient data structure
With data normalization any table dependency detected, the table is restructured into multiple
tables (two tables) which eliminate any column dependency. Incase data dependency is still
exhibited the process is repeated till such dependency are eliminated. The process of eliminating
data redundancy is based upon a theory called functional dependency
Importance of normalization
It highlights constraints and dependency in the data and hence aid the understanding the nature of
the data
Normalization controls data redundancy to reduce storage requirement and standard
maintenance
Normalization provide unique identification for records in a database
Each stage of normalization process eliminate a particular type of undesirable dependency
Normalization permits simple data retrieval in response to reports and queries
The third normalization form produces well designed database which provides a higher degree
of independency
Normalization helps define efficient data structures
Normalized data structures are used for file and database design
Normalization eliminate unnecessary dependency relationship within a database file
1. Extra storage space: storing the same data in many places takes large amount of disk space.
2. Entering same data more than once during data insertion.
3. Deleting data from more than one place during deletion.
4. Modifying data in more than one place.
5. Anomalies may occur in the database if insertion, deletion, modification etc are no done
properly. It creates inconsistency and unreliability in the database.
To solve this problem, the ―raw‖ database needs to be normalized. This is a step by step process
of removing different kinds of redundancy and anomaly at each step. At each step a specific rule
is followed to remove specific kind of impurity in order to give the database a slim and clean
look.
As you can see now, each row contains unique combination of values. Unlike in UNF, this
relation contains only atomic values, i.e. the rows can not be further decomposed, so the relation
is now in 1NF.
Let us explain. Emp-Id is the primary key of the above relation. Emp-Name, Month, Sales and
Bank-Name all depend upon Emp-Id. But the attribute Bank-Name depends on Bank-Id, which
is not the primary key of the table. So the table is in 1NF, but not in 2NF. If this position can be
removed into another related relation, it would come to 2NF.
After removing the portion into another relation we store lesser amount of data in two relations
without any loss information. There is also a significant reduction in redundancy.
Such derived dependencies hold well in most of the situations. For example if we have
Roll → Marks
And
Marks → Grade
Then we may safely derive
Roll → Grade.
This third dependency was not originally specified but we have derived it.
The derived dependency is called a transitive dependency when such dependency becomes
improbable. For example we have been given
Roll → City
And
City → STDCode
If we try to derive Roll → STDCode it becomes a transitive dependency, because obviously the
STDCode of a city cannot depend on the roll number issued by a school or college. In such a
case the relation should be broken into two, each containing one of these two dependencies:
Roll → City
And
City → STD code
The relation diagram for the above relation is given as the following:
The given relation is in 3NF. Observe, however, that the names of Dept. and Head of Dept. are
duplicated. Further, if Professor P2 resigns, rows 3 and 4 are deleted. We lose the information
that Rao is the Head of Department of Chemistry.
The normalization of the relation is done by creating a new relation for Dept. and Head of Dept.
and deleting Head of Dept. form the given relation. The normalized relations are shown in the
following.
Department
Head of Dept.
Physics Ghosh
Mathematics Krishnan
Chemistry Rao
See the dependency diagrams for these new relations.
A multi-valued dependency is a typical kind of dependency in which each and every attribute
within a relation depends upon the other, yet none of them is a unique primary key.
We will illustrate this with an example. Consider a vendor supplying many items to many
projects in an organization. The following are the assumptions:
1. If vendor V1 has to supply to project P2, but the item is not yet decided, then a row with a blank
for item code has to be introduced.
2. The information about item I1 is stored twice for vendor V3.
Observe that the relation given is in 3NF and also in BCNF. It still has the problem mentioned
above. The problem is reduced by expressing this relation as two relations in the Fourth Normal
Form (4NF). A relation is in 4NF if it has no more than one independent multi valued
dependency or one independent multi valued dependency with a functional dependency.
The table can be expressed as the two 4NF relations given as following. The fact that vendors are
capable of supplying certain items and that they are assigned to supply for some projects in
independently specified in the 4NF relation.
Vendor-Supply Vendor-Project
V1 I1 V1 P1
V1 I2 V1 P3
V2 I2 V2 P1
V2 I3 V3 P2
V3 I1
Fifth Normal Form (5NF)
These relations still have a problem. While defining the 4NF we mentioned that all
the attributes depend upon each other. While creating the two tables in the 4NF,
although we have preserved the dependencies between Vendor Code and Item code
in the first table and Vendor Code and Item code in the second table, we have lost the
relationship between Item Code and Project No. If there were a primary key then this
loss of dependency would not have occurred. In order to revive this relationship we
must add a new table like the following. Please note that during the entire process of
normalization, this is the only step where a new table is created by joining two
attributes, rather than splitting them into separate tables.