0% found this document useful (0 votes)
24 views

06a - Normalization

This document discusses normalization to third normal form (3NF). It begins with an example of a relation called SP that has anomalies like update, deletion, and insertion anomalies. To fix these anomalies, the relation is decomposed into two new relations called Supp and SP-Item to eliminate redundancy and anomalies. The document then discusses concepts like functional dependencies, candidate keys, and normal forms. It explains that the original SP relation is not in 2NF because attributes are not fully dependent on the primary key. By decomposing it into Supp and SP-Item relations, the new relations are in 2NF and eliminate anomalies.

Uploaded by

kohshi
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views

06a - Normalization

This document discusses normalization to third normal form (3NF). It begins with an example of a relation called SP that has anomalies like update, deletion, and insertion anomalies. To fix these anomalies, the relation is decomposed into two new relations called Supp and SP-Item to eliminate redundancy and anomalies. The document then discusses concepts like functional dependencies, candidate keys, and normal forms. It explains that the original SP relation is not in 2NF because attributes are not fully dependent on the primary key. By decomposing it into Supp and SP-Item relations, the new relations are in 2NF and eliminate anomalies.

Uploaded by

kohshi
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

06 - Normalization to 3NF

Design Theory for


Relational Databases
Normalization to 3NF

There are a lot of choices among


attribute sets of a relational schema
Some choices are better than others
We will study
Desirable properties
How to obtain these desirable properties

Department of Computer Science


Northern Illinois University
2000

Basic Concepts
Study following relation
10 Main
10 Main
20 State
20 State
30 Elm

Basic Concepts

Study following relation

SP ( Supp-Name, Supp-Addr, Item,


John
John
Jane
Jane
Frank

Do you see any


problems with
this relation?

Apple
Orange
Grape
Apple
Mango

Price )

SP ( Supp-Name, Supp-Addr, Item,

$2.00
2.50
1.25
2.25
6.00

John
John
Jane
Jane
Frank

10 Main
10 Main
20 State
20 State
30 Elm

Apple
Orange
Grape
Apple
Mango

Price )
$2.00
2.50
1.25
2.25
6.00

Basic Concepts
What happens if John moves?
SP ( Supp-Name, Supp-Addr, Item,
John
John
Jane
Jane
Frank

10 Main
10 Main
20 State
20 State
30 Elm

Apple
Orange
Grape
Apple
Mango

Every tuple of Johns


would have to be
changed.

Basic Concepts

What happens if John moves?


Price )

SP ( Supp-Name, Supp-Addr, Item,

$2.00
2.50
1.25
2.25
6.00

John
John
Jane
Jane
Frank
5

10 Main
10 Main
20 State
20 State
30 Elm

Apple
Orange
Grape
Apple
Mango

Price )
$2.00
2.50
1.25
2.25
6.00
6

06 - Normalization to 3NF

Basic Concepts

We would lose all the


information about
Frank and his address.

Basic Concepts

What happens to Franks Address if we


delete his tuple because he is
temporarily not supplying us?

What happens to Franks Address if we


delete his tuple because he is
temporarily not supplying us?

SP ( Supp-Name, Supp-Addr, Item,

SP ( Supp-Name, Supp-Addr, Item,

John
John
Jane
Jane
Frank

10 Main
10 Main
20 State
20 State
30 Elm

Apple
Orange
Grape
Apple
Mango

Price )
$2.00
2.50
1.25
2.25
6.00

Basic Concepts
What if we wish to keep track of a new
supplier and the address but do not
know what they will be supplying?
SP ( Supp-Name, Supp-Addr, Item,
John
John
Jane
Jane
Frank

10 Main
10 Main
20 State
20 State
30 Elm

Apple
Orange
Grape
Apple
Mango

10 Main
10 Main
20 State
20 State
30 Elm

We cannot insert a new tuple


with information about
a supplier until we know what
item they will be supplying!!
Remember the Entity Integrity
Constraint

Apple
Orange
Grape
Apple
Mango

John
John
Jane
Jane
Frank

$2.00
2.50
1.25
2.25
6.00

Basic Concepts

SP ( Supp-Name, Supp-Addr, Item,

Price )
$2.00
2.50
1.25
2.25
6.00

John
John
Jane
Jane
Frank

Price )

10 Main
10 Main
20 State
20 State
30 Elm

Apple
Orange
Grape
Apple
Mango

Price )
$2.00
2.50
1.25
2.25
6.00

10

Redundancy

Anomalies

When attribute values are repeated


unnecessarily
Notice that address is repeated for each
item that is supplied

11

Update Anomaly
caused by redundant information
must find all copies of information in order
to prevent inconsistencies when updating

12

06 - Normalization to 3NF

Anomalies
Deletion Anomaly

Anomalies
Insertion Anomaly

occurs when data is lost during a deletion


that we do not wish to be lost
occurs when there are attributes within a
tuple that are logically related to only part
of the primary key

occurs when we cannot insert some


information into a tuple because of a
violation of a relational constraint
occurs when a multiple attribute key cannot
be fully completed as necessary for
insertion

13

Fixing Anomalies
Decompose the relation
SP ( Supp-Name, Supp-Addr, Item, Price )
Supp( Supp-Name, Supp-Addr)
John
10 Main
Jane
20 State
Frank
30 Elm
SP-Item ( Supp-Name, Item, Price )
John
John
Jane
Jane
Frank

Apple $2.00
Orange 2.50
Grape 1.25
Apple 2.25
15
Mango 6.00

Decomposition of Relations

14

Fixing Anomalies:
Decomposing Relations

Notice there is not any


redundancy on supplier
and supplier address.
Update anomaly eliminated.

Supp( Supp-Name, Supp-Addr)


John
10 Main
Jane
20 State
Frank
30 Elm

We can insert new supplier


and its address and keep this
information without knowing
the item that will be
supplied.
Insertion anomaly eliminated.

We can delete the fact


SP-Item ( Supp-Name, Item, Price )
that Frank supplies
John
Apple $2.00
mangos without losing
John
Orange 2.50
Franks address.
Jane
Grape 1.25
Deletion anomaly eliminated
Jane
Apple
2.25
Frank

Mango

6.00
16

Relational Design
Methodology

Disadvantages
How do we tell whether one relation is
better than another?

it is more expensive to solve queries


example: Get the address of suppliers
supplying grapes.

Check for anomalies


Normalization (also called functional
dependency theory)

SP (Supp-Name, Supp-Addr, Item, Price )


Only need ONE relation with the original
Supp( Supp-Name, Supp-Addr)
SP-Item ( Supp-Name, Item, Price )
with decomposed schema - need to join two
relations

17

18

06 - Normalization to 3NF

Functional Dependency
Functional dependency

Functional Dependency
Functional dependency

constraints in the data that depend upon


NOT on the values within a given tuple
BUT on whether or not two tuples agree on
certain components.

Let R be a relation and let X and Y be subsets


of the attributes (one or more) of R we say
X
Y
X functionally determines Y
y is functionally dependent on X

if for all the tuples of R it is NOT possible that


two tuples agree on X but disagree on Y
19

20

Functional Dependency

Functional Dependency

Functional dependency

Person (SSN, Age, Gender)


FD = { SSN
Age
SSN
Gender
Age
Gender }

Given a unique value of X, we can ALWAYS


determine a value of Y

Since two people of the same age can be


of different genders
21

22

Functional Dependency

Functional Dependency

FDs are assertions about the real world


which cannot be proven
FDs are established by the database
designer by considering the meaning of the
attributes
FDs MUST hold for all possible data values
FDs can be enforced during insertion if
programmed and told to do so by the DBA
23

Emp-Proj ( Emp-ID, Project, Supv, Dept, Case )


e1
e2
e1
e3

p1
p2
p3
p3

s1
s2
s1
s1

d1
d2
d1
d1

c1
c2
c3
c3

FD = { Emp-ID,Project Project, Supv, Dept, Case


Emp-ID
Supv

Supv, Dept
Dept }

24

06 - Normalization to 3NF

Insert Anomaly:
We cannot enter a new employee
say e4 who works for supervisor
s4 until e4 works on some project.

What do you see that is


wrong with this relation?

Emp-Proj ( Emp-ID, Project, Supv, Dept, Case )


e1
e2
e1
e3
Update Anomaly:
If e1 has a new
supervisor, then
more than one tuple
has to be changed.

p1
p2
p3
p3

s1
s2
s1
s1

d1
d2
d1
d1

c1
c2
c3
c3

Before Normal Forms


Unnormalized
repeating groups
R (A, B, C, D (d1, d2, d3)*, E, F)
FD = { A B, C, E, F
A, d1, d2, d3 }

Deletion Anomaly:
If e2 does not work on project
p2 and we delete this tuple, we
will loose ALL information about
e2s supervisor, etc.

R1 (A, B, C, E, F)

D (A, d1, d2, d3)

FD = { A B, C, E, F }

FD2 = { A, d1 d2, d3 }

25

26

Normal Forms

First Normal Form


Emp-Proj ( Emp-ID, Project, Supv, Dept, Case )

1NF (First Normal Form)


This
relation is in
1NF.
All values
are atomic!

all values are atomic

e1
e2
e1
e3

p1
p2
p3
p3

s1
s2
s1
s1

d1
d2
d1
d1

c1
c2
c3
c3

FD = { Emp-ID,Project Emp-ID, Project, Supv, Dept,Case


Emp-ID

Supv, Dept
Supv

Dept }
27

28

Definitions

Definitions

Remember definitions of key


Remember definitions of key

Super Key:

A candidate key of a relation functionally


determines ALL attributes of the relation.

an attribute or set of attributes that uniquely


identify a tuple (can be > 1 in a relation)

Candidate Key:
a minimum set of attributes that uniquely
identify a tuple (can be > 1 in a relation)
a minimal super key

Primary Key:
one and only one per relation.
a chosen candidate key

29

30

06 - Normalization to 3NF

Definitions

Definitions
Prime Attribute

Fully Dependent

if an attribute appears in a key of a relation,


then it is a prime attribute.

an attribute set Y is fully dependent on


attribute set X if
X Y
and Y cannot be determined by any
subset of X
In Emp-Proj,

In Emp-Proj, Emp-ID is prime

Non-Prime Attribute
an attribute not appearing in a key of a
relation.

Case is fully dependent on Emp-ID, Project


Supv and Dept are NOT fully dependent on
Emp-ID, Project

In Emp-Proj, Supv is non-prime


31

Normal Forms

32

NOT in 2NF because


Supv and Dept are NOT
fully dependent on
Emp-ID, Project

Second Normal Form

Emp-Proj ( Emp-ID, Project, Supv, Dept, Case )

1NF (First Normal Form)


all values are atomic

2NF (Second Normal Form)


This FD
violates 2NF!

a relation is in 2NF if it is in 1NF and each


of its non-prime attributes are fully
dependent upon its entire primary key.

e1
e2
e1
e3

p1
p2
p3
p3

FD = { Emp-ID, Project
Emp-ID
33

Second Normal Form

Supv

s1
s2
s1
s1

d1
d2
d1
d1

c1
c2
c3
c3

Project, Supv, Dept, Case


Supv, Dept
34
Dept }

Second Normal Form

Emp-Proj ( Emp-ID, Project, Supv, Dept, Case )


FD = { Emp-ID, Project
Emp-ID
Supv

Project, Supv, Dept, Case


Supv, Dept
Dept }

EP1 ( Emp-ID, Supv, Dept)


FD1 = { Emp-ID
Supv

Supv, Dept
Dept }

EP1 ( Emp-ID, Supv, Dept)


FD1 = { Emp-ID
Supv

Supv, Dept
Dept }

EP2 ( Emp-ID, Project, Case)


FD2 = { Emp-ID, Project

Case }

EP2 ( Emp-ID, Project, Case)


FD2 = { Emp-ID, Project

Case }

35

36

06 - Normalization to 3NF

Problems with
Second Normal Form
EP1 ( Emp-ID, Supv, Dept)
e1
s1
d1
e2
s2
d2
e3
s1
d1

EP2 ( Emp-ID, Project, Case)


e1

p1

c1

e2

p2

c2

e1

p3

c3

e3

p3

c3

Insertion Anomaly:
We cannot insert the information that
supervisor s6 works for dept d6 unless
at least one employee works for s6.
EP1 ( Emp-ID, Supv, Dept)
e1
s1
d1
e2
s2
d2
e3
s1
d1
Update Anomaly:
If supervisor s1 is
moved to d4, we have
to make changes in
more than one tuple.

EP2 ( Emp-ID, Project, Case)


e1

p1

c1

e2

p2

c2

e1

p3

c3

e3

p3

c3

Deletion Anomaly:
If e2 is fired,
information that s2
works for d2 is lost.

37

Definitions

38

Normal Forms
1NF (First Normal Form)

Transitively Dependent
A non-prime attribute is transitively
dependent upon the primary key of a
relation if there is also a non-prime (non
key) attribute that functionally determines
the attribute.
In EP1, Dept is transitively dependent upon
Emp-ID since
Emp-ID Supv
Supv Dept

all values are atomic

2NF (Second Normal Form)


a relation is in 2NF if it is in 1NF and each
of its non-prime attributes are fully
dependent upon its key.

3NF (Third Normal Form)


a relation is in 3NF if it is in 2NF and none
of its non-prime attributes are transitively
dependent on its key.

39

40

Third Normal Form Relations


Third Normal Form

Emp-Proj ( Emp-ID, Project, Supv, Dept, Case )


FD = { Emp-ID,Project Project, Supv, Dept, Case
Emp-ID

EP1 ( Emp-ID, Supv, Dept)

Supv

FD1 = { Emp-ID

Supv, Dept

Dept }

Supv, Dept
EP2 ( Emp-ID, Project, Case)

Supv

Dept }

FD2 = { Emp-ID, Project

EP1-1( Supv, Dept)


FD1-1 = { Supv Dept }

EP1-2 (Emp-ID, Supv)

EP1-2 (Emp-ID, Supv)


FD1-2 = { Emp-ID

Case }

Final Result
EP1-1( Supv, Dept)
of
Normalization FD1-1 = { Supv Dept }
FD1-2 = { Emp-ID

Supv }
41

Supv }
42

06 - Normalization to 3NF

Third Normal Form Relations


Emp-Proj ( Emp-ID, Project, Supv, Dept, Case )
FD = { Emp-ID,Project Project, Supv, Dept, Case
Emp-ID

Supv

Supv, Dept

Dept }

EP1 ( Emp-ID, Supv, Dept)

EP2 ( Emp-ID, Project, Case)

FD1 = { Emp-ID
Supv

FD2 = { Emp-ID, Project

Supv, Dept
Dept }

EP1-1( Supv, Dept)


FD1-1 = { Supv Dept }

Case }

EP1-2 (Emp-ID, Supv)


FD1-2 = { Emp-ID

Supv }
43

You might also like