0% found this document useful (0 votes)
13 views

Unit-IV V1 (1)

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

Unit-IV V1 (1)

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 16

UNIT-4

Schema Refinement (Normalization): Purpose of Normalization or schema refinement,


concept of functional dependency, Closure of functional dependency and attribute closure,
Normal forms based on functional dependency(1NF, 2NF and 3 NF), concept of surrogate
key, Boyce-Codd normal form(BCNF), Lossless join and dependency preserving
decomposition, Fourth normal form(4NF), Fifth Normal Form (5NF).

The Schema Refinement refers to refining the schema by using some technique. The best
technique of schema refinement is decomposition.

Redundancy refers to repetition of the same data or duplicate copies of same data stored in
different locations.

Anomalies: Anomalies refers to the problems occurred after poorly planned and normalised
databases where all the data is stored in one table which is sometimes called a flat file
database

Problems Caused by Redundancy


Redundancy means having multiple copies of the same data in the database. This problem
arises when a database is not normalized. Suppose a table of student details attributes is:

student ID, student name, college name, college rank, and course opted.
Student_ID Name Contact College Course Rank

100 Himanshu 7300934851 GEU B.Tech 1

101 Ankit 7900734858 GEU B.Tech 1

102 Ayush 7300936759 GEU B.Tech 1

103 Ravi 7300901556 GEU B.Tech 1

Insertion Anomaly
If a student detail has to be inserted whose course is not being decided yet then insertion will
not be possible till the time course is decided for the student.
Student_ID Name Contact College Course Rank

100 Himanshu 7300934851 GEU 1

This problem happens when the insertion of a data record is not possible without adding
some additional unrelated data to the record.
Deletion Anomaly
If the details of students in this table are deleted then the details of the college will also get
deleted which should not occur by common sense. This anomaly happens when the deletion
of a data record results in losing some unrelated information that was stored as part of the
record that was deleted from a table.

It is not possible to delete some information without losing some other information in the
table as well.

Updation Anomaly
Suppose the rank of the college changes then changes will have to be all over the database
which will be time-consuming and computationally costly.
Student_ID Name Contact College Course Rank

100 Himanshu 7300934851 GEU B.Tech 1

101 Ankit 7900734858 GEU B.Tech 1

102 Ayush 7300936759 GEU B.Tech 1

103 Ravi 7300901556 GEU B.Tech 1

All places should be updated, If updation does not occur at all places then the database will
be in an inconsistent state.

The following are some of the problems caused by redundancy in a database:


1. Data Inconsistency
2. Storage Requirements
3. Update Anomalies
4. Performance Issues: Spends more time updating multiple copies of the same data.
This can lead to slower data retrieval and slower overall performance of the database.

To prevent redundancy in a database, normalization techniques can be used. Normalization is


the process of organizing data in a database to eliminate redundancy and improve data
integrity. Normalization involves breaking down a larger table into smaller tables and
establishing relationships between them. This reduces redundancy and makes the database
more efficient and reliable.

Concept of functional dependency


The functional dependency is a relationship that exists between two attributes. It typically
exists between the primary key and non-key attribute within a table.
X → Y

The left side of FD is known as a determinant, the right side of the production is known as a
dependent.

Assume we have an employee table with attributes: Emp_Id, Emp_Name, Emp_Address.


Emp_Id → Emp_Name (Emp_Name is functionally dependent on Emp_Id.)

Armstrong’s axioms/properties of functional dependencies:

1. Reflexivity: If Y is a subset of X, then X→Y holds by reflexivity rule

For example, {roll_no, name} → name is valid.

2. Augmentation: If X → Y is a valid dependency, then XZ → YZ is also valid by the

augmentation rule.

3. Transitivity: If X → Y and Y → Z are both valid dependencies, then X→Z is also

valid by the Transitivity rule.


Types of Functional dependencies in DBMS:
1. Trivial functional dependency
2. Non-Trivial functional dependency
3. Multivalued functional dependency
4. Transitive functional dependency

1. Trivial Functional Dependency


If X → Y and Y is the subset of X, then it is called trivial functional dependency.

{roll_no, name} → name, roll_no → roll_no

2. Non-trivial Functional Dependency


If X → Y and Y is not a subset of X, then it is called Non-trivial functional dependency.

roll_no → name, {roll_no, name} → age

3. Multivalued Functional Dependency


If a → {b, c} and there exists no functional dependency between b and c, then it is called
a multivalued functional dependency.
roll_no → {name, age}, (i.e. name → age or age → name doesn’t exist !)

4. Transitive Functional Dependency


If a → b & b → c, then according to the axiom of transitivity, a → c. This is a transitive

functional dependency.

Closure of functional dependency and attribute closure

Functional Dependency Set: Functional Dependency set or FD set of a relation is the set of
all FDs present in the relation. For Example, FD set for relation STUDENT shown in table 1
is:
{ STUD_NO->STUD_NAME, STUD_NO->STUD_PHONE, STUD_NO->STUD_STATE,
STUD_NO->STUD_COUNTRY,
STUD_NO -> STUD_AGE, STUD_STATE->STUD_COUNTRY }

STUD_STATE->STUD_COUNTRY will be true as if two records have same


STUD_STATE, they will have same STUD_COUNTRY as well.
Attribute Closure: Attribute closure of an attribute set can be defined as set of attributes
which can be functionally determined from it.
To find attribute closure of an attribute set:

● Add elements of attribute set to the result set.


● Recursively add elements to the result set which can be functionally determined
from the elements of the result set.

Using FD set of table 1, attribute closure can be determined as:


(STUD_NO)+ = {STUD_NO, STUD_NAME, STUD_PHONE, STUD_STATE,
STUD_COUNTRY, STUD_AGE}
(STUD_STATE)+ = {STUD_STATE, STUD_COUNTRY}
In this example STUD_NO IS CANDIDATE KEY (C.K) AND EVERY ATTRIBUTE IN
C.K IS A PRIME ATTRIBUTE AND REMAINING ATTRIBUTES ARE NON PRIME
ATTRIBUTES.

How to find Candidate Keys and Super Keys using Attribute Closure?
● If attribute closure of an attribute set contains
all attributes of relation, the attribute set will
be super key of the relation.
● If no subset of this attribute set can
functionally determine all attributes of the relation, the set will be
candidate key as well.

(STUD_NO, STUD_NAME)+ = {STUD_NO, STUD_NAME, STUD_PHONE,


STUD_STATE, STUD_COUNTRY, STUD_AGE}

(STUD_NO)+ = {STUD_NO, STUD_NAME, STUD_PHONE, STUD_STATE,


STUD_COUNTRY, STUD_AGE}
(STUD_NO, STUD_NAME) will be super key but not candidate key because
its subset (STUD_NO)+ is equal to all attributes of the relation. So, STUD_NO
will be a candidate key.

Normalization is the process of structuring and handling the relationship between data to
minimize redundancy in the relational table and avoid the unnecessary anomalies properties
from the database like insertion, update and delete. It helps to divide large database tables
into smaller tables and make a relationship between them.
OR
Normalization means “split the tables into small tables which will contain less number of
attributes in such a way that table design must not contain any problem of inserting, deleting,
updating anomalies and guarantees no redundancy”.

First Normal Form (1NF)


First Normal Form (1NF): A relation is said to in the 1NF if it is
already in un-normalized form and it satisfies the following
conditions or rules or qualifications are:
1. Each attribute name must be unique.
2. Each attribute value must be single or atomic i.e., Single
Valued Attributes.
3. Each row / record must be unique.
4. There is no repeating group’s..

Sample Employee table, it displays employees are working


with multiple departments.
To bring this table to first normal form, we split the table into following table and now we
have the resulting
Table in 1NF

Second Normal Form (2NF) A Relation will be in 2NF if it follows the following
condition:
● The table or relation should be in 1NF or First Normal Form.
● All the non-prime attributes should be fully functionally dependent on the candidate key.
● The table should not contain any partial dependency.

Note:
Partial Functional Dependency: If a non-prime attribute of the relation is getting
derived by only a part of the candidate key, then such dependency is known as Partial
Dependency

Example: Consider the following relation

This table has a composite primary key [Customer ID, Store ID]. The non-key attribute is
[Purchase Location]. In this case, [Purchase Location] only depends on [Store ID], which is
only part of the primary key. Therefore, this table does not satisfy second normal form.
To bring this table to second normal form, we break the table into two tables, and now we
have the following:

EXAMPLE -Given relation R(ABCD) and F:{ABC, BD} Decompose in into 2NF.
from the given FDs determine primary key. Necessary attributes to include in the key are A,
B (because this attributes are not in RHS of FD).
Find the closure set of AB
AB+ = ABC
= ABCD (∵ B D)
AB is a primary key.
From the FDs BD is partially depending on AB.
So decompose the table. (D is a non-prime attribute derived by a part of the key)

Third Normal Form (3NF): A database is in third normal form if it satisfies the
following conditions:

• It is in 2NF.

• There is no transitive functional dependency

By transitive functional dependency, we mean we have the following relationships in


the table .A is functionally dependent on B, and B is functionally dependent on C. In
this case, C is transitively dependent on A via B. and A non-key attribute is depending
on a non-key attribute.

Example: Consider the following relation

In the table, [Book ID] determines [Genre ID], and [Genre ID] determines [Genre Type].
Therefore, [Book ID] determines [Genre Type] via [Genre ID] and we have transitive
functional dependency, and this structure does not satisfy third normal form.
To bring this table to third normal form, we split the table into two as follows
Q1 Given relation R(ABCDE) and F:{AB→C, B→D, D→E} Decompose in into 3NF.
from the given FDs determine primary key.
Necessary attributes to include in the key are A, B (because this attributes are not in RHS of
FD).
Find the closure set of AB
AB+ = ABC
= ABCD (∵ B D)
= ABCDE (∵ D E)
AB is a primary key.
From the FDs B→D is partially depending on AB. So decompose the table. (D is a non-
prime attribute derived by a part of the key)
B+ = BDE
Rules of 3NF:
1. It should be in 2NF.
2. It should not contain Transitive Dependency for Non_Prime attributes.
(OR)
A table is in 3NF if and only if for each of its non-trivial(a->b b can not determinant
a) functional dependency at least one of the following condition holds:
1. L.H.S is superkey.
2. R.H.S is Prime attribute.

Eg1: R(A,B,C,D) FD(A->B,B->C,C->D) Find whether given relation is in3NF or not


Solution: Superkey (ABCD)+ = {A,B,C,D}
A is the superkey by discarding BCD using Armstrong Axioms from FD A->B,B is
discarded,from transitive dependency A->B,B->C,A->C,C can be discarded,again from
transitive dependency A->C,C->D,A->D,D can be discarded.
candidate key=A
Prime attribute=A, NonPrime attributes=B,C,D

In the Above FD
1.A->B (holds as complete candidate key determines nonprime attribute.)
2.B->C,C->D(doesn't hold as it exhibits transitive dependency i.e B->D or
nonprime attribute determines nonprime attribute.)

As the above FD have transitive dependency so the given relation is not in 3NF
Eg2: R(A,B,C,D,E,F) FD(AB->CDEF,BD->F) Find whether given relation is in 3NF or
not
Solution: Superkey (ABCDEF)+ = {A,B,C,D,E,F}
AB is the superkey by reducing CDEF using Armstrong Axioms from FD AB-> CDEF.
Candidate key=AB
Prime attribute=A, B
NonPrime attributes=C,D,E,F
In the Above FD
1.AB->CDEF(holds as complete candidate key determines nonprime attribute.)
2.BD->F(doesn't hold as it exhibits -------.)
As the above FD have transitive dependency so the given relation is not in 3NF

Surrogate Key:
A surrogate key is a special key which has no meaning or purpose other than to
uniquely identify each record.
Surrogate key is generated when a new record is inserted into a table automatically by
a database that can be declared as the primary key of that table .

We can say that , in case we do not have a natural primary key in a table, then we need
to artificially create one in order to uniquely identify a row in the table , this key is
called the surrogate key or synthetic primary key of the table.

However , surrogate key is not always the primary key . Suppose we have multiple
objects in a database that are connected to the surrogate key, then we will have many-
to-one association between the primary keys and the surrogate key and surrogate key
cannot be used as the primary key.
Features of the surrogate key :
1. It is automatically generated by the system.
2. It holds anonymous integer.
3. It contains unique value for all records of the table.
4. The value can never be modified by the user or application.
5. Surrogate key is called the factless key as it is added just for our ease of identification
of unique values and contains no relevant fact(or information) that is useful for the
table.
Consider an example of Tracking_System, where we have the following attributes:
Key: An attribute holding the key for each tracking id.
Track_id: An attribute holding the tracking id of the item.
Track_item: An attribute holding the name of the item that is being tracked.
Track_loc: An attribute holding the location of the tracking item.
The below diagram represents the above described Tracking_system table:
BCNF
Boyce-Codd normal form (BCNF): A relation is said to be in BCNF, if and only if every
determinant should be a candidate key.
BCNF is the advance version of 3NF. It is stricter than 3NF.
A table is in 3NF if for every functional dependency X → Y, X is the super key of the
table. For BCNF, the table should be in 3NF and for every FD, LHS is super key.

Example: Let's assume there is a company where employees work in more than one
department.
EMPLOYEE table:

In the above table Functional dependencies are as follows: EMP_ID → EMP_COUNTRY


and EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO} Candidate key: {EMP-ID, EMP-DEPT}
The table is not in BCNF because neither EMP_DEPT nor EMP_ID alone are keys. To
convert the given table into BCNF, we decompose it into three tables
Eg: Given Relation R(A,B,C) and FD(A->B,B->C,C->A). Find whether given relation
is in BCNF or Not?

Solution: Superkey (ABC)+ = {A,B,C}


A is the superkey by discarding BC using Armstrong Axioms from FD A->B,B is
discarded,from transitive dependency A->B,B->C,A->C,C can be discarded,
Candidate key=A, B, C (These are also super keys)
Prime attribute=A, B, C
From given FD’s all left hand side attributes are super keys. So given relation is in BCNF

Lossless join decomposition


There are two possibilities when a relation R is decomposed into R1 and R2.They are
● Lossy decomposition i.e., R1⋈R2⊃R

● Lossless decomposition i.e., R1⋈R2=R

For a decomposition to be lossless, it should hold the following conditions


● Union of attributes of R1 and R2 must be equal to attribute R. each attribute of

R must be either in R1 or in R2 i.e., Att(R1) ⋃ Att(R2) = Att(R)

● Intersection of attributes of R1 and R2 must not be null i.e., Att(R1) ⋂

Att(R2) ≠ Ø

● Common attribute must be a key for atleast one relation(R1 or R2) i.e.,

Att(R1) ⋂ Att(R2) -> Att(R1) or Att(R1) ⋂ Att(R2)->Att(R2)

Example
A relation R(A,B,C,D) with FD set {A->BC} is decomposed into R1(ABC) and R2(AD).
This is lossless join decomposition because
● First rule holds true as Att(R1) ⋃ Att(R2)=(ABC) ⋃ (AD)= (ABCD) =

Att(R)

● Second rule holds true as Att(R1) ⋂ Att(R2) = (ABC) ⋂ (AD) ≠ Ø

● Third rule holds true as Att(R1) ⋂ Att(R2) = A is a key of R1(ABC)

because A->BC is given

Dependency Preserving Decomposition


● If we decompose a relation R into relations R1 and R2, all dependencies of R

must be part of either R1 or R2 or must be derivable from combination of

functional dependencies(FD) of R1 and R2

● Suppose a relation R(A,B,C,D) with FD set {A->BC} is decomposed into

R1(ABC) and R2(AD) which is dependency preserving because FD A->BC is a

part of R1(ABC)

Example
Consider a schema R(A,B,C,D) and functional dependencies A->B and C->D which is

decomposed into R1(AB) and R2(CD)

This decomposition is dependency preserving decompostion because

● A->B can be ensured in R1(AB)

● C->D can be ensured in R2(CD)

Fourth Normal Form (4NF): A relation said to be in 4NF if it is in Boyce Codd normal
form and should have no multi-valued dependency.
1.For a dependency A→ B, if for a single value of A, multiple value of B exists then the
relation will be multi-valued dependency.

Note: Multi Valued Dependency: A table is said to have multi-valued dependency, if


the following conditions are true,
1🡪A table should have at-least 3 columns for it to have a multi-valued dependency.

2🡪 And, for A relation R (A, B, C), if there is a multi-valued dependency between, A and
B, then B and C should be independent of each other.
◼ If all these conditions are true for any relation (table), it is said to have multi-valued
dependency.

EXAMPLE

🡪The given STUDENT table is in 3NF but the COURSE and HOBBY are two independent
entity. Hence, there is no relationship between COURSE and HOBBY. In the STUDENT
relation, student with STU_ID, 21 contains two courses, Computer and Math and two
hobbies, Dancing and Singing. So there is a Multi-valued dependency on STU_ID, which
leads to un-necessary repetition of data.

🡪 So to make the above table into 4NF, we can decompose it into two tables:
STUDENT_COURSE STUDENT_HOBBY
Now these tables satisfy 4NF. In table1 Sid, Course is the key and in table2 Sid, Hobby is
the key.
5NF
Properties of 5NF:
A relation R is in 5NF if and only if it satisfies following conditions:
R should be in 4NF (no multi-valued dependency exists).
It cannot undergo lossless decomposition (join dependency)
5NF is also known as Project-Join Normal Form (PJ/NF).

Example – Consider the above schema, with a case as “if a company makes a product
and an agent is an agent for that company, then he always sells that product for the
company”. Under these circumstances, the ACP table is shown as:

Table ACP
Agent Company Product
A1 PQR Nut
A1 PQR Bolt
A1 XYZ Nut
A1 XYZ Bolt
A2 PQR Nut
The relation ACP is again decomposed into 3 relations. Now, the natural Join of all three
relations will be shown as:

Table R1
Agent Company
A1 PQR
A1 XYZ
A2 PQR

Table R2
Agent Product
A1 Nut
A1 Bolt
A2 Nut

Table R3
Company Product
PQR Nut
PQR Bolt
XYZ Nut
XYZ Bolt
The result of the Natural Join of R1 and R3 over ‘Company’ and then the Natural Join of
R13 and R2 over ‘Agent’and ‘Product’ will be Table ACP.However, 5NF is not applied in
practical scenarios and remains limited to theoretical concepts.

You might also like