ER Diagrams (Concluded), Schema Refinement, and Normalization
ER Diagrams (Concluded), Schema Refinement, and Normalization
Zachary G. Ives
University of Pennsylvania
CIS 550 – Database & Information Systems
October 6, 2005
Some slide content courtesy of Susan Davidson & Raghu Ramakrishnan
Examples of ER Diagrams
Please interpret these ER diagrams:
STUDENTS Takes COURSES
2
Converting ER Relationship Sets to
Tables: 1:n Relationships PROFESSORS
CREATE TABLE Teaches(
• “1” entity = key of fid INTEGER,
relationship set: serno CHAR(15), Teaches
semester CHAR(4),
PRIMARY KEY (serno),
FOREIGN KEY (fid) COURSES
REFERENCES PROFESSORS,
FOREIGN KEY (serno) REFERENCES Teaches)
CREATE TABLE Teaches_Course(
serno INTEGER,
• Or embed subj VARCHAR(30),
relationship in cid CHAR(15),
“many” entity set: fid CHAR(15),
when CHAR(4),
PRIMARY KEY (serno),
FOREIGN KEY (fid) REFERENCES PROFESSORS)
3
1:1 Relationships
If you borrow money or have credit, you might get:
4
ISA Relationships: Subclassing
(Structurally)
ISA
Employees salary
5
But How Does this Translate
into the Relational Model?
6
Weak Entities
A weak entity can only be identified uniquely using the primary
key of another (owner) entity.
Owner and weak entity sets in a one-to-many relationship
set, 1 owner : many weak entities
Weak entity set must have total participation
7
Translating Weak Entity Sets
Weak entity set and identifying relationship set are translated
into a single table; when the owner entity is deleted, all
owned weak entities must also be deleted
Indep
Study
Advisor
9
Summary of ER Diagrams
One of the primary ways of designing logical
schemas
CASE tools exist built around ER
(e.g. ERWin, PowerBuilder, etc.)
Translate the design automatically into DDL, XML, UML,
etc.
Use a slightly different notation that is better suited to
graphical displays
Some tools support constraints beyond what ER diagrams
can capture
Can you get different ER diagrams from the same data?
10
Schema Refinement & Design Theory
11
Not All Designs are Equally Good
Why is this a poor schema design?
Student(sid, name)
Course(serno, cid)
Subject(cid, subj)
Takes(sid, serno, exp-grade)
12
Focus on the Bad Design
sid name serno subj cid exp-grade
1 Sam 570103 AI 520 B
23 Nitin 550103 DB 550 A
45 Jill 505103 OS 505 A
1 Sam 505103 OS 505 C
13
Functional Dependencies
Describe “Key-Like” Relationships
A key is a set of attributes where:
If keys match, then the tuples match
A functional dependency (FD) is a generalization:
If an attribute set determines another, written X ! Y
then if two tuples agree on attribute set X, they must
agree on X:
sid ! name
14
Formal Definition of FD’s
Def. Given a relation schema R and subsets X, Y of R:
An instance r of R satisfies FD X Y if,
for any two tuples t1, t2 2 r,
t1[X ] = t2[X] implies t1[Y] = t2[Y]
For an FD to hold for schema R, it must hold for
every possible instance of r
15
General Thoughts on Good Schemas
16
Armstrong’s Axioms: Inferring FDs
Some FDs exist due to others; can compute using
Armstrong’s axioms:
Reflexivity: If Y X then X Y (trivial dependencies)
name, sid name
Augmentation: If X Y then XW YW
serno subj so serno, exp-grade subj, exp-grade
Transitivity: If X Y and Y Z then X Z
serno cid and cid subj
so serno subj
17
Armstrong’s Axioms Lead to…
Union: If X Y and X Z
then X YZ
Pseudotransitivity: If X Y and WY Z
then XW Z
Decomposition: If X Y and Z Y
then X Z
18
Closure of a Set of FD’s
Defn. Let F be a set of FD’s.
Its closure, F+,is the set of all FD’s:
{X Y | X Y is derivable from F by Armstrong’s
Axioms}
Which of the following are in the closure of our Student-Course
FD’s?
name name
cid subj
serno subj
cid, sid subj
cid sid
19
Attribute Closures: Is Something
Dependent on X?
Defn. The closure of an attribute set X, X+, is:
X+ = {Y | X Y F +}
This answers the question “is Y determined
(transitively) by X?”; compute X+ by:
closure := X;
repeat until no change {
if there is an FD U V in F
such that U is in closure
then add V to closure}
Does sid, serno subj, exp-grade?
20
Equivalence of FD sets
Defn. Two sets of FD’s, F and G, are equivalent if
their closures are equivalent, F + = G +
e.g., these two sets are equivalent:
{XY Z, X Y} and
{X Z, X Y}
21
Minimal Cover
Defn. A FD set F is minimal if: we express
1. Every FD in F is of the form X A, each FD in
where A is a single attribute simplest form
2. For no X A in F is: in a sense,
F – {X A } equivalent to F each FD is
3. For no X A in F and Z X is: “essential”
F – {X A } {Z A } equivalent to F to the cover
Defn. F is a minimum cover for G if F is minimal and is
equivalent to G.
e.g.,
{X Z, X Y} is a minimal cover for
{XY Z, X Z, X Y}
22
More on Closures
If F is a set of FD’s and X Y F +
then for some attribute A Y, X A F +
Proof by counterexample.
Assume otherwise and let Y = {A1,..., An}
Since we assume X A1, ..., X An are in F +
then X A1 ... An is in F + by union rule,
hence, X Y is in F + which is a contradiction
23
Why Armstrong’s Axioms?
Why are Armstrong’s axioms (or an equivalent rule
set) appropriate for FD’s? They are:
Consistent: any relation satisfying FD’s in F will satisfy
those in F +
Complete: if an FD X Y cannot be derived by
Armstrong’s axioms from F, then there exists some
relational instance satisfying F but not
XY
24
Proving Consistency
We prove that the axioms’ definitions must be true
for any instance, e.g.:
For augmentation (if X Y then XW YW):
25
Proving Completeness
Suppose X Y F + and define a relational instance
r that satisfies F + but not X Y:
Then for some attribute A Y, X A F +
Let some pair of tuples in r agree on X+ but disagree
everywhere else:
X A X+ –X R – X+ – {A}
26
Proof of Completeness cont’d
Clearly this relation fails to satisfy X A and X Y.
We also have to check that it satisfies any FD in F + .
The tuples agree on only X + .
Thus the only FD’s that might be violated are of the form
X’ Y’ where X’ X+ and Y’ contains attributes in
R – X+ – {A}.
But if X’ Y’ F+ and X’ X+ then Y’ X+ (reflexivity
and augmentation).
Therefore X’ Y’ is satisfied.
27
Decomposition
Consider our original “bad” attribute set
Stuff(sid, name, serno, subj, cid, exp-grade)
28
Lossless Join Decomposition
What if we decompose on
(sid, name) and (serno, subj, cid, exp-grade)?
29
Testing for Lossless Join
R1, R2 is a lossless join decomposition of R with respect to F
iff at least one of the following dependencies is in F+
(R1 R2) R1 – R2
(R1 R2) R2 – R1
So for the FD set:
sid name
serno cid, exp-grade
cid subj
30
Dependency Preservation
Ensures we can “easily” check whether a FD X Y
is violated during an update to a database:
32
Another Example
Given scheme: R(sid, fid, subj)
and FD set: fid subj
sid, subj fid
Consider the decomposition
R1(sid, fid) and R2(fid, subj)
Is it lossless?
Is it dependency preserving?
33
FD’s and Keys
Ideally, we want a design s.t. for each nontrivial
dependency X Y, X is a superkey for some
relation schema in R
We just saw that this isn’t always possible
Hence we have two kinds of normal forms
34
Two Important Normal Forms
Boyce-Codd Normal Form (BCNF). For every relation
scheme R and for every X A that holds over R,
either A X (it is trivial) ,or
or X is a superkey for R
Third Normal Form (3NF). For every relation scheme
R and for every X A that holds over R,
either A X (it is trivial), or
X is a superkey for R, or
A is a member of some key for R
35
Normal Forms Compared
BCNF is preferable, but sometimes in conflict with
the goal of dependency preservation
It’s strictly stronger than 3NF
36
BCNF Decomposition Algorithm
(from Korth et al.; our book gives recursive version)
result := {R}
compute F+
while there is a schema Ri in result that is not in BCNF
{
let A B be a nontrivial FD on Ri
s.t. A Ri is not in F+
and A and B are disjoint
37
3NF Decomposition Algorithm
by Phil Bernstein, now @ MS Research
39