Normalization
MODULE IV
Syllabus
Normalization
Different anomalies in designing a database, The idea of normalization,
Functional dependency, Armstrong’s Axioms (proofs not required),
Closures and their computation, Equivalence of Functional
Dependencies (FD), Minimal Cover (proofs not required). First Normal
Form (1NF), Second Normal Form (2NF), Third Normal Form (3NF), Boyce
Codd Normal Form (BCNF), Lossless join and dependency preserving
decomposition, Algorithms for checking Lossless Join (LJ) and
Dependency Preserving (DP) properties
Introduction
● Database Normalization is a technique of organizing the data in the
database.
● Normalization is a systematic approach of decomposing tables to eliminate
data redundancy(repetition) and undesirable characteristics like Insertion,
Update and Deletion Anomalies.
● It is a multi-step process that puts data into tabular form, removing
duplicated data from the relation tables.
● Normalization is used for mainly two purposes,
○ Eliminating redundant(useless) data.
○ Ensuring data dependencies make sense i.e data is logically
stored.
Introduction
Normalization presents a set of rules that tables and databases must
follow to be well structured.
Normalization rules are divided into the following normal forms:
● First Normal Form
● Second Normal Form
● Third Normal Form
● BCNF
● Fourth Normal Form
● Fifth Normal Form
Anomalies in Designing a DB
Redundant Information and Anomalies
● Information is stored redundantly
○ Wastes storage
○ Causes problems with update anomalies
Anomalies (flaw)
■ Insertion anomalies
■ Deletion anomalies
■ Modification anomalies
Insert Anomaly
○ Cannot insert a project unless an employee is assigned to it.
○ Cannot insert an employee unless an he/she is assigned to a project.
Delete Anomaly
○ When a project is deleted, it will result in deleting all the employees who work on
that project.
○ Alternately, if an employee is the sole employee on a project, deleting that
employee would result in deleting the corresponding project.
Update/ Modification Anomaly
Changing the name of project number P1 from “Billing” to “Customer-Accounting”
may cause this update to be made for all 100 employees working on project P1.
Informal Guidelines for Designing DB (4)
● GUIDELINE 1: Informally, each tuple in a relation
should represent one entity or relationship
instance.
○ Attributes of different entities (EMPLOYEEs,
DEPARTMENTs, PROJECTs) should not be mixed in the
same relation
○ Only foreign keys should be used to refer to other entities
○ Entity and relationship attributes should be kept apart as
much as possible.
Informal Guidelines for Designing DB
● GUIDELINE 2: Design a schema that does not
suffer from the insertion, deletion and
update anomalies.
Informal Guidelines for Designing DB
● GUIDELINE 3:
○ Relations should be designed such that their tuples will
have as few NULL values as possible
○ Attributes that are NULL frequently could be placed in
separate relations (with the primary key)
Informal Guidelines for Designing DB
● GUIDELINE 4:
○ No spurious (fake/false) tuples should be
generated by doing a natural-join of any relations.
Spurious tuples refer to undesired or unintended tuples that
are generated as a result of a join operation. These tuples
do not represent valid combinations of data and can distort
the accuracy of the query results.
Functional Dependencies
● Are used to specify formal measures of the "goodness" of relational
designs
● FDs are constraints that are derived from the meaning and
interrelationships of the data attributes
● It represents the relationship between the values of these
attributes: if we know the value of one set of attributes, we can
determine the value of another set of attributes.
Functional Dependencies
● Formally, if we have a relation R with attributes A and B. A
functional dependency, denoted as A → B, indicates that for any two
tuples t1 and t2 in R, if t1[A] = t2[A], then t1[B] = t2[B]. In simple
terms, the value of attribute B is determined by the value of
attribute A.
● For example, consider a relation Students with attributes
{StudentID, Name, Age, Major}. If we have a functional dependency
StudentID → Name, it means that given a StudentID, we can
uniquely determine the Name of the student.
Defining Functional Dependencies
A functional dependency X Y holds if whenever two tuples
have the same value for X, they must have the same
value for Y
○ For any two tuples t1 and t2 in any relation instance r(R):
If t1[X]=t2[X], then t1[Y]=t2[Y]
● X Y in R specifies a constraint on all relation instances
r(R)
● FDs are derived from the real-world constraints on the
attributes
Examples of FD constraints
● Social security number determines employee name
○ SSN ENAME
● Project number determines project name and location
○ PNUMBER {PNAME, PLOCATION}
● Employee ssn and project number determines the hours per week
that the employee works on the project
○ {SSN, PNUMBER} HOURS
Summary
● A FD is a property of the attributes in the schema R ,the
constraint must hold on every relation instance r(R)
● If K is a key of R, then K functionally determines all attributes
in R
○ (since we never have two distinct tuples with t1[K]=t2[K])
What FDs may exist?
● A relation R(A, B, C, D) with its extension.
● Which FDs may exist in this relation?
BC
Diagrammatic Representation of FD
uid name dept_name dept_building
U2200CS42 Joel John CS Main
U2200IT43 Rony Joy IT KE
U2200EC44 Nivethitha EC Main
U2200ME23 Rony Joy ME KE
Valid FDs
uid→ { name, dept_name, dept_building },
◦ Here, uid can determine values of fields name, dept_name and
dept_building, hence a valid Functional dependency
uid→ dept_name ,
◦ Since, uid can determine whole set of {name, dept_name,
dept_building}, it can determine its subset dept_name also.
dept_name → dept_building ,
◦ Dept_name can identify the dept_building accurately, since
departments with different dept_name will also have a different
dept_building
Invalid FDs
name → dept_name
◦ Students with the same name can have different dept_name,
hence this is not a valid functional dependency.
dept_building → dept_name
◦ There can be multiple departments in the same building
◦ hence dept_building → dept_name is an invalid functional
dependency.
Armstrong’s Axioms
Reflexivity: If Y is a subset of X, then X→Y holds by reflexivity rule
For example, {uid, name} → name is valid.
{X,Y} → Y is valid.
Augmentation: If X → Y is a valid dependency, then XZ → YZ is also valid by the
augmentation rule.
For example, If {uid, name} → dept_building is valid,
hence {uid, name, dept_name} → {dept_building, dept_name} is also valid.
Transitivity: If X → Y and Y → Z are both valid dependencies, then X→Z is also valid by
the Transitivity rule.
For example, uid → dept_name & dept_name → dept_building, then
uid → dept_building is also valid.
Attribute Closure
● Closure of an Attribute: Closure of an Attribute can be defined as
a set of attributes that can be functionally determined from it.
● Closure of an attribute X is denoted as X+
● It represents all the attributes that can be inferred or derived from a
given set of attributes based on the defined functional
dependencies.
● Formally, given a set of attributes X and a set of functional
dependencies F, the closure of X, denoted as X+, is the set of all
attributes that can be functionally determined from X using the
functional dependencies in F
Example 1
● Given relational schema R( P Q R S T U V) having following attribute P Q
R S T U and V, also there is a set of functional dependency denoted by
FD = { P->Q, QR->ST, PTV->V }.Determine Closure of (QR)+ and (PR)+
a) QR+ = QR (as the closure of an attribute or set of attributes contain same).
Now as per algorithm look into a set of FD that complete the left side of any FD contains
either Q, R, or QR since in FD QR→ST has complete QR.
Hence QR+ = QRST
Again, trace the remaining two FD that any left part of FD contains any Q, R, S, T.
Since no complete left side of the remaining two FD{P->Q, PTV->V} contain Q, R, S, T.
Therefore QR+ = QRST
b) PR + = PR (as the closure of an attribute or set of attributes contain same)
FD = { P->Q, QR->ST, PTV->V }
Now as per algorithm look into a set of FD, and check that complete left side of
any FD contains either P, R, or PR. Since in FD P→Q, P is a subset of PR, Hence
PR+ = PRQ
Again, trace the remaining two FD that any left part of FD contains any P, R, Q,
Since, in FD QR → ST has its complete left part QR in PQR
Hence PR+ = PRQST
Again trace the remaining one FD { PTV->V } that its complete left belongs to
PRQST. Since complete PTV is not in PRQST, hence we ignore it.
Therefore PR+ = PRQST
Example 2
Given relational schema R( P Q R S T) having following attributes P Q R
S and T, also there is a set of functional dependency denoted by FD =
{ P->QR, RS->T, Q->S, T-> P }. Determine Closure of ( T ) +
Answer
T+ = TPQRS
Questions
● Find the closure of A,B,C,D given R(A,B,C,D),FD : {A->B,B->D,C->B}
A+= ABD
B+=BD
C+=CBD
D+=D
● Find the closure of A, given A → BC, BC → DE, D → F, CF → G
A+= ABCDEFG
Equivalence of Functional Dependencies
● A set of functional dependencies F is said to cover another set of
functional dependencies E if every FD in E is also in F+; that is, if every
dependency in E can be inferred from F; alternatively, we can say that
E is
covered by F.
● Definition. Two sets of functional dependencies E and F are
equivalent if
E+ = F+. Therefore, equivalence means that every FD in E can be
inferred or derived from F, and every FD in F can be inferred from E;
that is, E is equivalent to F if both the conditions—E covers F and F
covers E—hold.
● We can determine whether F covers E by calculating X+ with
respect to F for each FD X → Y in E, and then checking whether this
X+ includes the attributes in Y. If this is the case for every FD in E,
then F covers E. We determine whether E and F are equivalent by
checking that E covers F and F covers E.
Example 1
A relation R(A,B,C,D) having two FD sets FD1 = {A->B, B->C, AB->D} and FD2 = {A->B, B->C, A->C, A->D} . Check
whether FD1 and FD2 are equivalent
Step 1: Checking whether all FDs of FD1 are present in FD2
A->B and B->C in set FD1 is present in set FD2.
AB->D is present in set FD1 but not directly in FD2 but we will check whether we can derive it or not. Find
closure AB in FD2,
(AB)+ = ABCD. It means that AB can functionally determine A, B, C, and D. So AB->D will also hold in set FD2.
As all FDs in set FD1 also hold in set FD2, FD2 ⊃ FD1 is true.
Step 2: Checking whether all FDs of FD2 are present in FD1
A->B and B->C in set FD2 is present in set FD1.
A->C is present in FD2 but not directly in FD1 but we will check whether we can derive it or not. Find closure of
A+ in FD1,
(A)+ = ABCD. It means that A can functionally determine A, B, C, and D. SO A->C will also hold in set FD1.
Similarly ,A->D is present in FD2 but not directly in FD1 . Since A+ =ABCD , A->D will also hold in set FD1.
As all FDs in set FD2 also hold in set FD1, FD1 ⊃ FD2 is true.
Step 3: As FD2 ⊃ FD1 and FD1 ⊃ FD2 both are true FD2 =FD1 is true. These two FD sets are semantically equivalent.
Example 2
A relation R2(A,B,C,D) having two FD sets FD1 = {A->B, B->C,A->C} and FD2 = {A->B, B->C, A->D}
Check equivalence of FD1 and FD2
Step 1: Checking whether all FDs of FD1 are present in FD2
A->B and B->C in set FD1 is present in set FD2.
A->C is present in FD1 but not directly in FD2 but we will check whether we can derive it or not. For set
FD2, (A)+ = {A, B, C, D}. It means that A can functionally determine A, B, C, and D. SO A->C will also hold
in set FD2.
As all FDs in set FD1 also hold in set FD2, FD2 ⊃ FD1 is true.
Step 2: Checking whether all FDs of FD2 are present in FD1
A->B and B->C in set FD2 is present in set FD1.,
A->D is present in FD2 but not directly in FD1 but we will check whether we can derive it or not. For set
FD1, (A)+ = {A,B,C}. It means that A can’t functionally determine D.
So A->D will not hold in FD1.
As all FDs in set FD2 do not hold in set FD1, FD2 ⊄ FD1.
Step 3: In this case, FD2 ⊃ FD1 and FD2 ⊄ FD1, these two FD sets are not semantically equivalent.
Example 3
Show that the following two sets of FDs are equivalent:
F = {A → C, AC → D, E → AD, E → H} and G = {A → CD, E → AH}
Step 1: Checking whether all FDs of F are present in G
No FD is directly present , (So find closure of all left hand side of F in G)
A+= ACD , AC can be derived
AC+=ACD , ACD can be derived
E+=EAHCD , EAD, EH can be derived
So F subset of G
Step 2: Checking whether all FDs of G are present in F
A+=ACD , ACD can be derived
E+=EADH , EAH can be derived
So G subset of F
Hence F and G are equivalent FDs
Example 4
Consider the schema R(X, Y, Z,W,V), Check whether P and Q are equivalent
P{ XY,XYZ, WXZ,WV}, Q{ XYZ, WXV}
Step 1: Checking whether all FDs of P are present in Q
No FD is directly present , So find
X+=XYZ XY
XY+=XYZ XYZ
w+=WXVYZ WXZ, WV
So P is a subset of Q
Step 2: Checking whether all FDs of Q are present in P
X+ =XYZ XYZ
W+=WXZVYZ WXV
So Q is a subset of P
Hence P and Q are equivalent FDs
Example 5
Consider the schema R(A,B,C,D), Check whether P and Q are equivalent
P: {AB,BC,CD} Q:{ABC,CD}
Step 1: Checking whether all FDs of P are present in Q
CD Is present
So find A+ =ABCD , AB,
B+= B
But b C cannot be derived from the closures, So P is not a subset of Q
Step 2: Checking whether all FDs of Q are present in P
A+=ABCD ABC
CD directly present
Hence Q is a subset of P
Since P is not a subset of Q P and Q are not Equivalent
Homework
● Check for schema R(A,B,C,D,E,H),P : { AC,ACD,EAD, EH} and Q:{ ACD, EAH}
are equivalent
Ans :Equivalent
● Check for schema R(X,Y,Z),P : { XY,YZ,ZX} and Q:{ XYZ,YX,ZX}are
equivalent
Ans : Equivalent
● Check for schema R(A,B,C,D,E,F),P : { ABC,BCDE, AEF} and Q:
{ ABCF,BDE,EAB}are equivalent
Ans :Not Equivalent
Minimal Cover
● Definition. A minimal cover of a set of functional dependencies E is a
minimal
set of dependencies (in the standard canonical form and without
redundancy) that is equivalent to E.
● Minimal cover of a set of functional dependencies E is a set of functional
dependencies F that satisfies the property that every dependency in E is
in the closure F+ of F.
● This property is lost if any dependency from the set F is removed; F must
have no redundancies in it, and the dependencies in F are in a standard
form
● An attribute in a functional dependency is considered an extraneous
attribute if we can remove it without changing the closure of the set of
dependencies.
● We can formally define a set of functional dependencies F to be
minimal if it satisfies the following conditions:
1. Every dependency in F has a single attribute for its right-hand side.
2. We cannot replace any dependency X → A in F with a dependency Y
→ A,
where Y is a proper subset of X, and still have a set of dependencies
that is
equivalent to F.
3. We cannot remove any dependency from F and still have a set of
dependencies that is equivalent to F.
Example 1:
Let the given set of FDs be E: {B → A, D → A, AB → D}. Find the minimal cover of E.
■Step 1: (Singe attribute on RHS) All above dependencies are in canonical form (that is,
they have only one attribute on the right-hand side), so we have completed step 1 of
algorithm
Step 2 : we need to determine if AB → D has any redundant (extraneous) attribute on the
left-hand side; that is, can it be replaced by B → D or A → D?
Find A+=A
B+=BA So A is redundant and can be removed from AB D, Thus AB → D may be replaced
by B→ D.
■ We now have a set equivalent to original E, say E′: {B → A, D → A, B → D}.
No further reduction is possible in step 2 since all FDs have a single attribute
on the left-hand side.
Step 3 we look for a redundant FD in E′. By using the transitive rule on
B → D and D → A, we derive B → A. Hence B → A is redundant in E′ and can be eliminated.
■ Therefore, the minimal cover of E is F: {B → D, D → A}.
Example 2:
Let the given set of FDs be G: {A → BCDE, CD → E}. Find the canonical cover
(minimal cover)
Step 1: Here, the given FDs are NOT in the canonical form. So we first convert
them into:
E: {A → B, A→ C, A→ D, A→ E, CD → E}.
Step 2 : of the algorithm, for CD → E, neither C nor D is extraneous on the left-
hand side, since we cannot show that C → E or D → E from the given FDs. Hence
we cannot replace it with either.
Step 3 : We need to check if any FD is redundant. Since A C, A D can be
written as A→ CD
By transitive rule , from A→ CD and CD → E, we get A→ E. Thus, A→ E is
redundant in G and can be removed
■ So we are left with the set F, equivalent to the original set G as: {A → B, A→ C,
A→ D, CD → E}. we can combine the first three FDs using the union rule and
express the minimum cover as: Minimum cover of G, F: {A → BCD, CD → E}.
Example 3
Find the minimal cover F: { ABC,BC, AB, AB C}
Step 1: Converting to canonical form AB, AC,BC, AB, AB C
Step 2: Check for redundant attribute in ABC
A+= ABC Since it contains B , B is a redundant attribute and can be removed
Hence F’ : {AB, AC,BC, A C} ,
Step 3: Check for redundant FD
A+= AC, No B
A+ =AB, No C
B+=B
Hence no redundant FD
After combining first two FDs in : {AB, AC,BC},
Minimal Cover F’: {A BC, B C}
Example 4
Find minimal cover F:{ A C, AC D, E H, EAD}
Step 1: Make RHS , single attribute F’:{ A C, AC D, E H, EA, ED}
Step 2 : Check for redundant attribute in AC D
A+= AC, C is redundant and can be removed
So, AC D becomes A D
F’:{ A C, A D, E H, EA, ED}
Step 3: Check for redundant FD
Consider removal of A C, A+=AD Since no C in A+, A C cannot be removed
Consider removal of A D A+= AC Since no D in A+, A D cannot be removed
Consider removal of E H E+= EADC , Since H is not in closure E H cannot be removed
E A , E+=ED, No A, So cannot be removed
E D, E+=EACD, D in closure , so can be removed
F’:{ A C, A D, E H, EA}
After combining FD’s , we get the minimal cover F’:{ A CD, E AH}
Finding Candidate Keys of a Relation
● Candidate Key is a Super Key whose no proper subset is a super
key, i.e. suppose if ABC is a candidate key then neither A, B, C or
any of its combination can be super key
● Candidate key is a minimal set of attributes of an R( Relational
Schema) which can be used to identify a tuple uniquely.
Example 1
Given R( X Y Z W) and FD= { XYZ → W, XY → ZW and X → YZW} . Find the superkeys and candidate keys
Step 1: Calculate the closure of XYZ+ = XYZW
Since XYZ closure is determining all the attributes of the table, hence it is Super Key
Step 2:Calculate the closure of XY+ = XYZW
Since XY closure is determining all the attributes of the table, hence it is Super Key
Step 3: Let us calculate the closure of X+ = XYZW
Since X closure is determining all the attributes of the table, hence it is Super Key
Candidate Key is a Super Key whose no proper subset is a superkey
● From the above definition, XYZ is not a candidate key, as in Step 2 and 3 we found that XY and X are
also Super Key (i.e., subset of XYZ are also SK which violate the definition)
● XY is not a candidate key, as in Step 3 we found that X is also a Super Key (i.e., subset of XY are also
SK which violate the definition)
● X is the Candidate key: As X cannot be further subdivided, or X cannot have any subset.
● Hence XYZ, XY, and X are all Super Key, while the only X is a candidate key
Example 2
Given R( X Y Z W) and FD= { XY → Z, Z → YW, and W → X }
Step 1: XY+ = XYZW , hence it is Super Key
Step 2: Z+ = ZYWX hence it is Super Key
Step 3: W+ = WX hence it is Not Super Key, since it is not SK it can never
be Candidate key
Candidate Key is a Super Key whose no proper subset is a
superkey
From the above definition XY is a candidate key, as in Step 2 and 3 none
of the subsets of XY i.e. either X or Y is Super Key.
Z is the Candidate key: As Z cannot be further subdivided, or Z cannot
have any subset.
Hence XY and Z are Super Key, also XY and Z are a candidate key
Example 3
● Given R( X Y Z W) and FD= { Y → XZW, XZW → Y }
Closure of Y+ = XYZW
Closure of XZW+ = XZWY
Y is candidate key, As Y cannot be further subdivided, or Y cannot
have any subset.
XZW is Candidate key: As no proper subset of XZW is super Key
Hence Y and XZW are Super Key, also Y, and XZW are also a
candidate key.
Finding Candidate Keys of a Relation
● Let R = (A, B, C, D, E, F) be a relation scheme with the following
dependencies-
C → F, E → A, EC → D, A → B. What is the candidate key of the
relation?
{ CE }+
={C,E}
= { C , E , F } ( Using C → F )
= { A , C , E , F } ( Using E → A )
= { A , C , D , E , F } ( Using EC → D )
= { A , B , C , D , E , F } ( Using A → B )
We conclude that CE can determine all the attributes of the given
relation.
Finding Candidate Keys of a Relation
● Let R = (A, B, C, D, E) be a relation schema with the following
dependencies- AB → C, C → D, B → E. Determine the total number
of candidate keys