0% found this document useful (0 votes)
29 views

Module 4 Dbms Student

Uploaded by

Gangadhar Bhuvan
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views

Module 4 Dbms Student

Uploaded by

Gangadhar Bhuvan
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 51

Functional Dependencies and

Normalization for Relational


Databases

Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe


Chapter Outline
• 1 Informal Design Guidelines for Relational Databases
• 1.1Semantics of the Relation Attributes
• 1.2 Redundant Information in Tuples and Update Anomali
es
• 1.3 Null Values in Tuples
• 1.4 Spurious Tuples

• 2 Functional Dependencies (FDs)


• 2.1 Definition of FD
• 2.2 Inference Rules for FDs
• 2.3 Equivalence of Sets of FDs
• 2.4 Minimal Sets of FDs

Slide 10- 2
Chapter Outline
• 3 Normal Forms Based on Primary Keys
• 3.1 Normalization of Relations
• 3.2 Practical Use of Normal Forms
• 3.3 Definitions of Keys and Attributes Participating in Keys
• 3.4 First Normal Form
• 3.5 Second Normal Form
• 3.6 Third Normal Form

• 4 General Normal Form Definitions (For Multiple Keys)

• 5 BCNF (Boyce-Codd Normal Form)

Slide 10- 3
1 Informal Design Guidelines for Relational
Databases (1)
• What is relational database design?
• The grouping of attributes to form "good" relation sch
emas
• Two levels of relation schemas
• The logical "user view" level
• The storage "base relation" level
• Design is concerned mainly with base relations
• What are the criteria for "good" base relations?

Slide 10- 4
Informal Design Guidelines for Relational Da
tabases (2)
• We first discuss informal guidelines for good relational d
esign
• Then we discuss formal concepts of functional dependen
cies and normal forms
• - 1NF (First Normal Form)
• - 2NF (Second Normal Form)
• - 3NF (Third Normal Form)
• - BCNF (Boyce-Codd Normal Form)
• Additional types of dependencies, further normal forms,
relational design algorithms by synthesis are discussed in
Chapter 11

Slide 10- 5
1.1 Semantics of the Relation Attributes
• GUIDELINE 1: Informally, each tuple in a relation should r
epresent one entity or relationship instance. (Applies to i
ndividual relations and their attributes).
• Attributes of different entities (EMPLOYEEs, DEPARTMEN
Ts, PROJECTs) should not be mixed in the same relation
• Only foreign keys should be used to refer to other entities
• Entity and relationship attributes should be kept apart as
much as possible.
• Bottom Line: Design a schema that can be explained easily
relation by relation. The semantics of attributes should be e
asy to interpret.

Slide 10- 6
Figure 10.1 A simplified COMPANY relationa
l database schema

Slide 10- 7
1.2 Redundant Information in Tuples and U
pdate Anomalies
• Information is stored redundantly
• Wastes storage
• Causes problems with update anomalies
• Insertion anomalies
• Deletion anomalies
• Modification anomalies

Slide 10- 8
EXAMPLE OF AN UPDATE ANOMALY
• Consider the relation:
• EMP_PROJ(Emp#, Proj#, Ename, Pname, No_hours)
• Update Anomaly:
• Changing the name of project number P1 from “Billin
g” to “Customer-Accounting” may cause this update to
be made for all 100 employees working on project P1.

Slide 10- 9
EXAMPLE OF AN INSERT ANOMALY
• Consider the relation:
• EMP_PROJ(Emp#, Proj#, Ename, Pname, No_hours)
• Insert Anomaly:
• Cannot insert a project unless an employee is assigne
d to it.
• Conversely
• Cannot insert an employee unless an he/she is assign
ed to a project.

Slide 10- 10
EXAMPLE OF AN DELETE ANOMALY
• Consider the relation:
• EMP_PROJ(Emp#, Proj#, Ename, Pname, No_hours)
• Delete Anomaly:
• When a project is deleted, it will result in deleting all t
he employees who work on that project.
• Alternately, if an employee is the sole employee on a p
roject, deleting that employee would result in deleting
the corresponding project.

Slide 10- 11
Figure 10.3 Two relation schemas suffering f
rom update anomalies

Slide 10- 12
Figure 10.4 Example States for EMP_DEPT a
nd EMP_PROJ

Slide 10- 13
Guideline to Redundant Information in Tupl
es and Update Anomalies
• GUIDELINE 2:
• Design a schema that does not suffer from the insertio
n, deletion and update anomalies.
• If there are any anomalies present, then note them so
that applications can be made to take them into accou
nt.

Slide 10- 14
1.3 Null Values in Tuples
• GUIDELINE 3:
• Relations should be designed such that their tuples wi
ll have as few NULL values as possible
• Attributes that are NULL frequently could be placed i
n separate relations (with the primary key)
• Reasons for nulls:
• Attribute not applicable or invalid
• Attribute value unknown (may exist)
• Value known to exist, but unavailable

Slide 10- 15
1.4 Spurious Tuples
• Bad designs for a relational database may result i
n erroneous results for certain JOIN operations
• The "lossless join" property is used to guarantee
meaningful results for join operations

• GUIDELINE 4:
• The relations should be designed to satisfy the lossles
s join condition.
• No spurious tuples should be generated by doing a na
tural-join of any relations.

Slide 10- 16
Spurious Tuples (2)
• There are two important properties of decompositions:
a) Non-additive or losslessness of the corresponding join
b) Preservation of the functional dependencies.

• Note that:
• Property (a) is extremely important and cannot be sacrifi
ced.
• Property (b) is less stringent and may be sacrificed. (See
Chapter 11).

Slide 10- 17
2.1 Functional Dependencies (1
)
• Functional dependencies (FDs)
• Are used to specify formal measures of the "goodness"
of relational designs
• And keys are used to define normal forms for relatio
ns
• Are constraints that are derived from the meaning a
nd interrelationships of the data attributes
• A set of attributes X functionally determines a set
of attributes Y if the value of X determines a uniq
ue value for Y

Slide 10- 18
Functional Dependencies (2)
• X -> Y holds if whenever two tuples have the same value f
or X, they must have the same value for Y
• For any two tuples t1 and t2 in any relation instance r(R): I
f t1[X]=t2[X], then t1[Y]=t2[Y]
• X -> Y in R specifies a constraint on all relation instances r
(R)
• Written as X -> Y; can be displayed graphically on a relati
on schema as in Figures. ( denoted by the arrow: ).
• FDs are derived from the real-world constraints on the at
tributes

Slide 10- 19
Examples of FD constraints (1)
• Social security number determines employee na
me
• SSN -> ENAME
• Project number determines project name and loc
ation
• PNUMBER -> {PNAME, PLOCATION}
• Employee ssn and project number determines th
e hours per week that the employee works on the
project
• {SSN, PNUMBER} -> HOURS

Slide 10- 20
Examples of FD constraints (2)
• An FD is a property of the attributes in the schem
aR
• The constraint must hold on every relation instan
ce r(R)
• If K is a key of R, then K functionally determines a
ll attributes in R
• (since we never have two distinct tuples with t1[K]=t
2[K])

Slide 10- 21
2.2 Inference Rules for FDs (1)
• Given a set of FDs F, we can infer additional FDs that hold
whenever the FDs in F hold
• Armstrong's inference rules:
• IR1. (Reflexive) If Y subset-of X, then X -> Y
• IR2. (Augmentation) If X -> Y, then XZ -> YZ
• (Notation: XZ stands for X U Z)
• IR3. (Transitive) If X -> Y and Y -> Z, then X -> Z

• IR1, IR2, IR3 form a sound and complete set of inference


rules
• These are rules hold and all other rules that hold can be de
duced from these

Slide 10- 22
Inference Rules for FDs (2)
• Some additional inference rules that are useful:
• Decomposition: If X -> YZ, then X -> Y and X -> Z
• Union: If X -> Y and X -> Z, then X -> YZ
• Psuedotransitivity: If X -> Y and WY -> Z, then WX ->
Z

• The last three inference rules, as well as any othe


r inference rules, can be deduced from IR1, IR2, a
nd IR3 (completeness property)

Slide 10- 23
Inference Rules for FDs (3)
• Closure of a set F of FDs is the set F+ of all FDs th
at can be inferred from F

• Closure of a set of attributes X with respect to F i


s the set X+ of all attributes that are functionally d
etermined by X

• X+ can be calculated by repeatedly applying IR1, I


R2, IR3 using the FDs in F

Slide 10- 24
2.3 Equivalence of Sets of FDs
• Two sets of FDs F and G are equivalent if:
• Every FD in F can be inferred from G, and
• Every FD in G can be inferred from F
• Hence, F and G are equivalent if F+ =G+
• Definition (Covers):
• F covers G if every FD in G can be inferred from F
• (i.e., if G+ subset-of F+)
• F and G are equivalent if F covers G and G covers F
• There is an algorithm for checking equivalence of sets of
FDs

Slide 10- 25
2.4 Minimal Sets of FDs (1)
• A set of FDs is minimal if it satisfies the followi
ng conditions:
1. Every dependency in F has a single attribute for its
RHS.
2. We cannot remove any dependency from F and hav
e a set of dependencies that is equivalent to F.
3. We cannot replace any dependency X -> A in F with
a dependency Y -> A, where Y proper-subset-of X (
Y subset-of X) and still have a set of dependencies t
hat is equivalent to F.

Slide 10- 26
Minimal Sets of FDs (2)
• Every set of FDs has an equivalent minimal set
• There can be several equivalent minimal sets
• There is no simple algorithm for computing a mi
nimal set of FDs that is equivalent to a set F of FD
s
• To synthesize a set of relations, we assume that
we start with a set of dependencies that is a mini
mal set
• E.g., see algorithms 11.2 and 11.4

Slide 10- 27
3 Normal Forms Based on Primary Keys
• 3.1 Normalization of Relations
• 3.2 Practical Use of Normal Forms
• 3.3 Definitions of Keys and Attributes Participati
ng in Keys
• 3.4 First Normal Form
• 3.5 Second Normal Form
• 3.6 Third Normal Form

Slide 10- 28
3.1 Normalization of Relations (
1)
• Normalization:
• The process of decomposing unsatisfactory "bad" rela
tions by breaking up their attributes into smaller relat
ions

• Normal form:
• Condition using keys and FDs of a relation to certify w
hether a relation schema is in a particular normal for
m

Slide 10- 29
Normalization of Relations (2)
• 2NF, 3NF, BCNF
• based on keys and FDs of a relation schema
• 4NF
• based on keys, multi-valued dependencies : MVDs; 5N
F based on keys, join dependencies : JDs (Chapter 11)
• Additional properties may be needed to ensure a
good relational design (lossless join, dependency
preservation; Chapter 11)

Slide 10- 30
3.2 Practical Use of Normal For
ms
• Normalization is carried out in practice so that the resul
ting designs are of high quality and meet the desirable pr
operties
• The practical utility of these normal forms becomes quest
ionable when the constraints on which they are based are
hard to understand or to detect
• The database designers need not normalize to the highest
possible normal form
• (usually up to 3NF, BCNF or 4NF)
• Denormalization:
• The process of storing the join of higher normal form relati
ons as a base relation—which is in a lower normal form

Slide 10- 31
3.3 Definitions of Keys and Attributes P
articipating in Keys (1)
• A superkey of a relation schema R = {A1, A2, ....,
An} is a set of attributes S subset-of R with the pr
operty that no two tuples t1 and t2 in any legal re
lation state r of R will have t1[S] = t2[S]

• A key K is a superkey with the additional proper


ty that removal of any attribute from K will cause
K not to be a superkey any more.

Slide 10- 32
Definitions of Keys and Attributes Partici
pating in Keys (2)
• If a relation schema has more than one key, each i
s called a candidate key.
• One of the candidate keys is arbitrarily designated to
be the primary key, and the others are called second
ary keys.
• A Prime attribute must be a member of some ca
ndidate key
• A Nonprime attribute is not a prime attribute—
that is, it is not a member of any candidate key.

Slide 10- 33
3.2 First Normal Form
• Disallows
• composite attributes
• multivalued attributes
• nested relations; attributes whose values for an indi
vidual tuple are non-atomic

• Considered to be part of the definition of relation

Slide 10- 34
Figure 10.8 Normalization into
1NF

Slide 10- 35
Figure 10.9 Normalization nested relations i
nto 1NF

Slide 10- 36
3.3 Second Normal Form (1)
• Uses the concepts of FDs, primary key
• Definitions
• Prime attribute: An attribute that is member of the prima
ry key K
• Full functional dependency: a FD Y -> Z where removal o
f any attribute from Y means the FD does not hold any mor
e
• Examples:
• {SSN, PNUMBER} -> HOURS is a full FD since neither SSN -
> HOURS nor PNUMBER -> HOURS hold
• {SSN, PNUMBER} -> ENAME is not a full FD (it is called a p
artial dependency ) since SSN -> ENAME also holds

Slide 10- 37
Second Normal Form (2)
• A relation schema R is in second normal form (
2NF) if every non-prime attribute A in R is fully f
unctionally dependent on the primary key

• R can be decomposed into 2NF relations via the p


rocess of 2NF normalization

Slide 10- 38
Figure 10.10 Normalizing into 2NF and 3NF

Slide 10- 39
Figure 10.11 Normalization into 2NF and 3N
F

Slide 10- 40
3.4 Third Normal Form (1)
• Definition:
• Transitive functional dependency: a FD X -> Z that
can be derived from two FDs X -> Y and Y -> Z
• Examples:
• SSN -> DMGRSSN is a transitive FD
• Since SSN -> DNUMBER and DNUMBER -> DMGRSSN hold
• SSN -> ENAME is non-transitive
• Since there is no set of attributes X where SSN -> X and X ->
ENAME

Slide 10- 41
Third Normal Form (2)
• A relation schema R is in third normal form (3NF) if it is
in 2NF and no non-prime attribute A in R is transitively d
ependent on the primary key
• R can be decomposed into 3NF relations via the process o
f 3NF normalization
• NOTE:
• In X -> Y and Y -> Z, with X as the primary key, we consider
this a problem only if Y is not a candidate key.
• When Y is a candidate key, there is no problem with the tra
nsitive dependency .
• E.g., Consider EMP (SSN, Emp#, Salary ).
• Here, SSN -> Emp# -> Salary and Emp# is a candidate key.

Slide 10- 42
Normal Forms Defined Informal
ly
• 1st normal form
• All attributes depend on the key
• 2nd normal form
• All attributes depend on the whole key
• 3rd normal form
• All attributes depend on nothing but the key

Slide 10- 43
4 General Normal Form Definitions (For Mul
tiple Keys) (1)
• The above definitions consider the primary key o
nly
• The following more general definitions take into
account relations with multiple candidate keys
• A relation schema R is in second normal form (
2NF) if every non-prime attribute A in R is fully f
unctionally dependent on every key of R

Slide 10- 44
General Normal Form Definition
s (2)
• Definition:
• Superkey of relation schema R - a set of attributes S o
f R that contains a key of R
• A relation schema R is in third normal form (3NF) if
whenever a FD X -> A holds in R, then either:
• (a) X is a superkey of R, or
• (b) A is a prime attribute of R
• NOTE: Boyce-Codd normal form disallows condit
ion (b) above

Slide 10- 45
5 BCNF (Boyce-Codd Normal Fo
rm)
• A relation schema R is in Boyce-Codd Normal Form (BC
NF) if whenever an FD X -> A holds in R, then X is a supe
rkey of R
• Each normal form is strictly stronger than the previous o
ne
• Every 2NF relation is in 1NF
• Every 3NF relation is in 2NF
• Every BCNF relation is in 3NF
• There exist relations that are in 3NF but not in BCNF
• The goal is to have each relation in BCNF (or 3NF)

Slide 10- 46
Figure 10.12 Boyce-Codd normal form

Slide 10- 47
Figure 10.13 a relation TEACH that is in 3NF
but not in BCNF

Slide 10- 48
Achieving the BCNF by Decomposition (1)
• Two FDs exist in the relation TEACH:
• fd1: { student, course} -> instructor
• fd2: instructor -> course
• {student, course} is a candidate key for this relation and t
hat the dependencies shown follow the pattern in Figure
10.12 (b).
• So this relation is in 3NF but not in BCNF
• A relation NOT in BCNF should be decomposed so as to
meet this property, while possibly forgoing the preservati
on of all functional dependencies in the decomposed rela
tions.
• (See Algorithm 11.3)

Slide 10- 49
Achieving the BCNF by Decomposition (2)
• Three possible decompositions for relation TEACH
• {student, instructor} and {student, course}
• {course, instructor } and {course, student}
• {instructor, course } and {instructor, student}
• All three decompositions will lose fd1.
• We have to settle for sacrificing the functional dependency prese
rvation. But we cannot sacrifice the non-additivity property after
decomposition.
• Out of the above three, only the 3rd decomposition will not generate
spurious tuples after join.(and hence has the non-additivity property
).
• A test to determine whether a binary decomposition (decomposition
into two relations) is non-additive (lossless) is discussed in section 1
1.1.4 under Property LJ1. Verify that the third decomposition above
meets the property.

Slide 10- 50
Chapter Summary
• Informal Design Guidelines for Relational Databa
ses
• Functional Dependencies (FDs)
• Definition, Inference Rules, Equivalence of Sets of FDs,
Minimal Sets of FDs
• Normal Forms Based on Primary Keys
• General Normal Form Definitions (For Multiple K
eys)
• BCNF (Boyce-Codd Normal Form)

Slide 10- 51

You might also like