Unit-IV V1 (1)
Unit-IV V1 (1)
The Schema Refinement refers to refining the schema by using some technique. The best
technique of schema refinement is decomposition.
Redundancy refers to repetition of the same data or duplicate copies of same data stored in
different locations.
Anomalies: Anomalies refers to the problems occurred after poorly planned and normalised
databases where all the data is stored in one table which is sometimes called a flat file
database
student ID, student name, college name, college rank, and course opted.
Student_ID Name Contact College Course Rank
Insertion Anomaly
If a student detail has to be inserted whose course is not being decided yet then insertion will
not be possible till the time course is decided for the student.
Student_ID Name Contact College Course Rank
This problem happens when the insertion of a data record is not possible without adding
some additional unrelated data to the record.
Deletion Anomaly
If the details of students in this table are deleted then the details of the college will also get
deleted which should not occur by common sense. This anomaly happens when the deletion
of a data record results in losing some unrelated information that was stored as part of the
record that was deleted from a table.
It is not possible to delete some information without losing some other information in the
table as well.
Updation Anomaly
Suppose the rank of the college changes then changes will have to be all over the database
which will be time-consuming and computationally costly.
Student_ID Name Contact College Course Rank
All places should be updated, If updation does not occur at all places then the database will
be in an inconsistent state.
The left side of FD is known as a determinant, the right side of the production is known as a
dependent.
augmentation rule.
functional dependency.
Functional Dependency Set: Functional Dependency set or FD set of a relation is the set of
all FDs present in the relation. For Example, FD set for relation STUDENT shown in table 1
is:
{ STUD_NO->STUD_NAME, STUD_NO->STUD_PHONE, STUD_NO->STUD_STATE,
STUD_NO->STUD_COUNTRY,
STUD_NO -> STUD_AGE, STUD_STATE->STUD_COUNTRY }
How to find Candidate Keys and Super Keys using Attribute Closure?
● If attribute closure of an attribute set contains
all attributes of relation, the attribute set will
be super key of the relation.
● If no subset of this attribute set can
functionally determine all attributes of the relation, the set will be
candidate key as well.
Normalization is the process of structuring and handling the relationship between data to
minimize redundancy in the relational table and avoid the unnecessary anomalies properties
from the database like insertion, update and delete. It helps to divide large database tables
into smaller tables and make a relationship between them.
OR
Normalization means “split the tables into small tables which will contain less number of
attributes in such a way that table design must not contain any problem of inserting, deleting,
updating anomalies and guarantees no redundancy”.
Second Normal Form (2NF) A Relation will be in 2NF if it follows the following
condition:
● The table or relation should be in 1NF or First Normal Form.
● All the non-prime attributes should be fully functionally dependent on the candidate key.
● The table should not contain any partial dependency.
Note:
Partial Functional Dependency: If a non-prime attribute of the relation is getting
derived by only a part of the candidate key, then such dependency is known as Partial
Dependency
This table has a composite primary key [Customer ID, Store ID]. The non-key attribute is
[Purchase Location]. In this case, [Purchase Location] only depends on [Store ID], which is
only part of the primary key. Therefore, this table does not satisfy second normal form.
To bring this table to second normal form, we break the table into two tables, and now we
have the following:
EXAMPLE -Given relation R(ABCD) and F:{ABC, BD} Decompose in into 2NF.
from the given FDs determine primary key. Necessary attributes to include in the key are A,
B (because this attributes are not in RHS of FD).
Find the closure set of AB
AB+ = ABC
= ABCD (∵ B D)
AB is a primary key.
From the FDs BD is partially depending on AB.
So decompose the table. (D is a non-prime attribute derived by a part of the key)
Third Normal Form (3NF): A database is in third normal form if it satisfies the
following conditions:
• It is in 2NF.
In the table, [Book ID] determines [Genre ID], and [Genre ID] determines [Genre Type].
Therefore, [Book ID] determines [Genre Type] via [Genre ID] and we have transitive
functional dependency, and this structure does not satisfy third normal form.
To bring this table to third normal form, we split the table into two as follows
Q1 Given relation R(ABCDE) and F:{AB→C, B→D, D→E} Decompose in into 3NF.
from the given FDs determine primary key.
Necessary attributes to include in the key are A, B (because this attributes are not in RHS of
FD).
Find the closure set of AB
AB+ = ABC
= ABCD (∵ B D)
= ABCDE (∵ D E)
AB is a primary key.
From the FDs B→D is partially depending on AB. So decompose the table. (D is a non-
prime attribute derived by a part of the key)
B+ = BDE
Rules of 3NF:
1. It should be in 2NF.
2. It should not contain Transitive Dependency for Non_Prime attributes.
(OR)
A table is in 3NF if and only if for each of its non-trivial(a->b b can not determinant
a) functional dependency at least one of the following condition holds:
1. L.H.S is superkey.
2. R.H.S is Prime attribute.
In the Above FD
1.A->B (holds as complete candidate key determines nonprime attribute.)
2.B->C,C->D(doesn't hold as it exhibits transitive dependency i.e B->D or
nonprime attribute determines nonprime attribute.)
As the above FD have transitive dependency so the given relation is not in 3NF
Eg2: R(A,B,C,D,E,F) FD(AB->CDEF,BD->F) Find whether given relation is in 3NF or
not
Solution: Superkey (ABCDEF)+ = {A,B,C,D,E,F}
AB is the superkey by reducing CDEF using Armstrong Axioms from FD AB-> CDEF.
Candidate key=AB
Prime attribute=A, B
NonPrime attributes=C,D,E,F
In the Above FD
1.AB->CDEF(holds as complete candidate key determines nonprime attribute.)
2.BD->F(doesn't hold as it exhibits -------.)
As the above FD have transitive dependency so the given relation is not in 3NF
Surrogate Key:
A surrogate key is a special key which has no meaning or purpose other than to
uniquely identify each record.
Surrogate key is generated when a new record is inserted into a table automatically by
a database that can be declared as the primary key of that table .
We can say that , in case we do not have a natural primary key in a table, then we need
to artificially create one in order to uniquely identify a row in the table , this key is
called the surrogate key or synthetic primary key of the table.
However , surrogate key is not always the primary key . Suppose we have multiple
objects in a database that are connected to the surrogate key, then we will have many-
to-one association between the primary keys and the surrogate key and surrogate key
cannot be used as the primary key.
Features of the surrogate key :
1. It is automatically generated by the system.
2. It holds anonymous integer.
3. It contains unique value for all records of the table.
4. The value can never be modified by the user or application.
5. Surrogate key is called the factless key as it is added just for our ease of identification
of unique values and contains no relevant fact(or information) that is useful for the
table.
Consider an example of Tracking_System, where we have the following attributes:
Key: An attribute holding the key for each tracking id.
Track_id: An attribute holding the tracking id of the item.
Track_item: An attribute holding the name of the item that is being tracked.
Track_loc: An attribute holding the location of the tracking item.
The below diagram represents the above described Tracking_system table:
BCNF
Boyce-Codd normal form (BCNF): A relation is said to be in BCNF, if and only if every
determinant should be a candidate key.
BCNF is the advance version of 3NF. It is stricter than 3NF.
A table is in 3NF if for every functional dependency X → Y, X is the super key of the
table. For BCNF, the table should be in 3NF and for every FD, LHS is super key.
Example: Let's assume there is a company where employees work in more than one
department.
EMPLOYEE table:
Att(R2) ≠ Ø
● Common attribute must be a key for atleast one relation(R1 or R2) i.e.,
Example
A relation R(A,B,C,D) with FD set {A->BC} is decomposed into R1(ABC) and R2(AD).
This is lossless join decomposition because
● First rule holds true as Att(R1) ⋃ Att(R2)=(ABC) ⋃ (AD)= (ABCD) =
Att(R)
part of R1(ABC)
Example
Consider a schema R(A,B,C,D) and functional dependencies A->B and C->D which is
Fourth Normal Form (4NF): A relation said to be in 4NF if it is in Boyce Codd normal
form and should have no multi-valued dependency.
1.For a dependency A→ B, if for a single value of A, multiple value of B exists then the
relation will be multi-valued dependency.
2🡪 And, for A relation R (A, B, C), if there is a multi-valued dependency between, A and
B, then B and C should be independent of each other.
◼ If all these conditions are true for any relation (table), it is said to have multi-valued
dependency.
EXAMPLE
🡪The given STUDENT table is in 3NF but the COURSE and HOBBY are two independent
entity. Hence, there is no relationship between COURSE and HOBBY. In the STUDENT
relation, student with STU_ID, 21 contains two courses, Computer and Math and two
hobbies, Dancing and Singing. So there is a Multi-valued dependency on STU_ID, which
leads to un-necessary repetition of data.
🡪 So to make the above table into 4NF, we can decompose it into two tables:
STUDENT_COURSE STUDENT_HOBBY
Now these tables satisfy 4NF. In table1 Sid, Course is the key and in table2 Sid, Hobby is
the key.
5NF
Properties of 5NF:
A relation R is in 5NF if and only if it satisfies following conditions:
R should be in 4NF (no multi-valued dependency exists).
It cannot undergo lossless decomposition (join dependency)
5NF is also known as Project-Join Normal Form (PJ/NF).
Example – Consider the above schema, with a case as “if a company makes a product
and an agent is an agent for that company, then he always sells that product for the
company”. Under these circumstances, the ACP table is shown as:
Table ACP
Agent Company Product
A1 PQR Nut
A1 PQR Bolt
A1 XYZ Nut
A1 XYZ Bolt
A2 PQR Nut
The relation ACP is again decomposed into 3 relations. Now, the natural Join of all three
relations will be shown as:
Table R1
Agent Company
A1 PQR
A1 XYZ
A2 PQR
Table R2
Agent Product
A1 Nut
A1 Bolt
A2 Nut
Table R3
Company Product
PQR Nut
PQR Bolt
XYZ Nut
XYZ Bolt
The result of the Natural Join of R1 and R3 over ‘Company’ and then the Natural Join of
R13 and R2 over ‘Agent’and ‘Product’ will be Table ACP.However, 5NF is not applied in
practical scenarios and remains limited to theoretical concepts.