18CSC303J DBMS Unit Iv
18CSC303J DBMS Unit Iv
UNIT-IV
INTRODUCTION
• Notation: p (r)
• p is called the selection predicate
• Example: select those tuples of the instructor relation where the instructor is in the “Physics” department.
• Query
dept_name=“Physics” (instructor)
SELECT
• A unary operation that returns its argument relation, with certain attributes left
out.
• Notation:
A1,A2,A3 ….Ak (r)
where A1, A2, …, Ak are attribute names and r is a relation name.
• The result is defined as the relation of k columns obtained by erasing the
columns that are not listed
• Duplicate rows removed from result, since relations are sets
PROJECT
• The Cartesian-product operation (denoted by X) allows us to combine information from any two relations.
• Example: the Cartesian product of the relations instructor and teaches is written as:
instructor X teaches
• Since the instructor ID appears in both relations we distinguish between these attribute by attaching to the
attribute the name of the relation from which the attribute originally came.
• instructor.ID
• teaches.ID
CARTESIAN RESULT
CARTESIAN PRODUCT
• The Cartesian-Product
instructor X teaches
associates every tuple of instructor with every tuple of teaches.
• Most of the resulting rows have information about instructors who did NOT teach a particular course.
• To get only those tuples of “instructor X teaches “ that pertain to instructors and the courses that they taught,
we write:
• To find all courses taught in the Fall 2017 semester, or in the Spring 2018 semester, or in both
• The set-intersection operation allows us to find tuples that are in both the input relations.
• Notation: r s
• Assume:
• r, s have the same arity
• attributes of r and s are compatible
• Example: Find the set of all courses taught in both the Fall 2017 and the Spring 2018 semesters.
• The set-difference operation allows us to find tuples that are in one relation but are not in another.
• Notation r – s
• Set differences must be taken between compatible relations.
• r and s must have the same arity
• attribute domains of r and s must be compatible
• Example: to find all courses taught in the Fall 2017 semester, but not in the Spring 2018 semester
course_id ( semester=“Fall” Λ year=2017 (section)) −
course_id ( semester=“Spring” Λ year=2018 (section))
ASSIGNMENT
• The results of relational-algebra expressions do not have a name that we can use to refer to them. The rename
operator, , is provided for that purpose
• The expression:
x (E)
returns the result of expression E under the name x
• Another form of the rename operation:
x(A1,A2, .. An) (E)
INTRODUCTION
• Find the ID, name, dept_name, salary for instructors whose salary
is greater than $80,000
{t | t instructor t [salary ] 80000}
1.1 Redundancy:
Storing the same data in multiple places can lead to inconsistencies and
inefficiencies.
1.2 Inconsistency:
Data inconsistencies can occur when different parts of the database hold
different versions of the same data.
1.3 Inefficiency:
Poor database design can result in slower query performance and increased
storage requirements.
1.4 Complexity:
A complex database schema can be difficult to understand and maintain,
leading to errors and inefficiencies.
Decomposing a Bad Schema
• employees2(department, salary)
Example
In this new design, each table has fewer attributes and there is no
redundancy.
• A→B
• B→C
• C→D
To find the closure of the attribute set {A}, the axioms are used to derive all the other
attributes that are functionally dependent on {A}.
Using reflexivity, we know that A → A holds true, so we can augment {A} to get {A, B}.
Then, using transitivity, we know that {A, B} → C holds true, so we can augment {A, B} to
get {A, B, C}.
Finally, using transitivity again, we know that {A, B, C} → D holds true, so we can
augment {A, B, C} to get the closure of {A}, which is {A, B, C, D}.
In this example, we have identified all the functional dependencies that hold in the relation
and eliminated any redundancy or anomalies by normalizing the relation.
Closure of FD set
● Given a set F of functional dependencies on a schema, we can prove that certain other functional dependencies
also hold on the schema. We say that such functional dependencies are “logically implied” by F.
● Let F be a set of functional dependencies. The closure of F, denoted by F+, is the set of all functional
dependencies logically implied by F. Given F, we can compute F+ directly from the formal definition of
functional dependency.
45
● We can use the following three rules to find logically implied functional
dependencies.
● By applying these rules repeatedly, we can find all of F+, given F. This collection of
rules is called Armstrong’s axioms in honor of the person who first proposed it.
46
● Armstrong’s axioms are sound, because they do not generate any incorrect functional
dependencies. They are complete, because, for a given set F of functional
dependencies, they allow us to generate all F+.
47
ADDITIONAL RULES
48
PROCEDURE TO COMPUTE F+
F+ = F
apply the reflexivity rule /* Generates all trivial dependencies */
repeat
for each functional dependency f in F+
apply the augmentation rule on f
add the resulting functional dependencies to F+
for each pair of functional dependencies f1 and f2 in F+
if f1 and f2 can be combined using transitivity
add the resulting functional dependency to F+
until F+ does not change any further
49
EXAMPLE
For relation R = (A, B, C, G, H, I) and the set F of functional dependencies {A → B, A → C, CG → H, CG
→ I , B → H}. Find out the closure of functional dependency.
● Another way of finding that AG → I holds is as follows: We use the augmentation rule on A → C to
infer AG → CG. Applying the transitivity rule to this dependency and CG → I, we infer AG → I .
50
Closure of attributes
51
PSEUDOCODE TO FIND THE CLOSURE OF ATTRIBUTE
52
EXAMPLE
● Given Relational schema R(A,B,C,D,E) with the given FDs find the closure
of an attribute ‘E’.
53
Irreducible a set of functional dependencies
(or)Canonical Cover
● A canonical cover or irreducible a set of functional dependencies
FD is a simplified set of FD that has a similar closure as the
original set FD.
Extraneous attributes
● An attribute of an FD is said to be extraneous if we can remove it
without changing the closure of the set of FD.
54
PROCEDURE TO COMPUTE CANONICAL COVER
Fc = F
repeat
Use the union rule to replace any dependencies in Fc of the form
α1 → β1 and α1 → β2 with α1 → β1 β2.
Find a functional dependency α → β in Fc with an extraneous
attribute either in α or in β.
/* Note: the test for extraneous attributes is done using Fc, not F */
If an extraneous attribute is found, delete it from α → β in Fc.
until (Fc does not change)
55
● A canonical cover Fc for F is a set of dependencies such that F logically implies all
dependencies in Fc, and Fc logically implies all dependencies in F. Furthermore, Fc
must have the following properties:
● No functional dependency in Fc contains an extraneous attribute.
● Each left side of a functional dependency in Fc is unique. That is, there are no two
dependencies α1 → β1 and α2 → β2 in Fc such that α1 = α2
56
Normalization
✔ Database Normalization is the technique of organizing data into
more than one table in the database.
✔ Updation Anomaly: To update address of student to occurs twice (or) more than
twice in a table we have to update address column in all the rows else the data will
come inconsistent.
✔ Insertion Anomaly: Suppose for a student new admission, we have s_id, s_name,
s_address of a student but the student has not opted for any subject then we have
to insert null for the subject which leads to insertion anomalies.
✔ Deletion Anomaly: If s_id = 401 has only one subject and temporarily drop it when
we delete that row entire student record will be deleted along with it.
NORMAL FORMS
✔ The term Normalization comes from the concept of normal form which
describe how to organize the data in the database.
✔ Normal Forms were developed around the concept of table based on relational
database.
Types
✔ 1NF (First Normal Form)
Course Content
Programming C, Java, Python
✔ If and only if all the attributes of relation are atomic (Single valued) in
nature.
✔ Must not contain any multi valued (or) composite attributes.
Course Content
Programming C, Java, Python
Web PHP
✔ This 1NF allows data redundancy and there will be many column with same
data in multiple rows but each row as a whole will be unique.
✔ It is used in most small to medium application.
SECOND NORMAL FORM (2NF)
✔ A relation is said to be in Second Normal Form,
✔ If and only if it is already in First Normal Form.
✔ There exist no Partial Dependency.
✔ Partial Dependency:
✔ It means that a non-prime attribute is functionally dependent on part of a prime
attribute.
✔ Functional Dependency:
✔ All the non key attributes are fully functional depends on key attribute.
68
To eliminate Partial Dependency
✔ After removing the position into another relation store the lesser
amount of data in two relations without the loss of data.
Example Table
Emp_id E_name Month Sales B_id B_name
E1 AA Jan 1000 B1 SBI
E1 AA Feb 1200 B1 SBI
E2 BB Jan 2200 B2 UBI
E2 BB Feb 2500 B2 UBI
E3 CC Jan 1700 B1 SBI
E3 CC Feb 1800 B1 SBI
Table 1 Table 2
Eg:
Rollno->marks
Marks->grade
Rollno->grade
To eliminate Transitive Dependency
✔ For each non-key attribute that is determinant in relation, create a
new relation that attribute becomes a primary key in new relation.
✔ Leave the attribute (which serve as primary key in new relation) in old
relation to serve as a foreign key.
S_id S_name City Pincode
1 Shiva Chennai 15
2 Peter Pondy 17
3 Meera Villupuram 20
4 Rohan Delhi 01
5 Shakthi Bangalore 18
Example Table
Table 1 Table 2
S_id S_name Pincode Pincode City
1 Shiva 15 15 Chennai
2 Peter 17 17 Pondy
3 Meera 20
20 Villupuram
4 Rohan 01
01 Delhi
5 Shakthi 18
18 Bangalore
Fourth Normal Form (4NF)
Functional dependencies can help us detect poor E-R design
multivalued dependency arises from one of the following sources:
• A many-to-many relationship set.
• A multivalued attribute of an entity set.
A relation schema (R) is in fourth normal form(4NF)with respect to a set D of functional
and multivalued dependencies if, for all multivalued dependencies in D of the form
A→→B, where A⊆R and B ⊆R, at least one of the following holds:
• A→→B is a trivial multivalued dependency.
• A is a super key for R.
4NF
ID→→dept name is a non-trivial multivalued dependency and ID is not a superkey for the schema.
4NF DECOMPOSITION ALGORITHM
Lossless-join Decomposition
2NF A relation will be in 2NF if it is in 1NF and all non-key attributes are fully functional dependent on the primary key.
BCNF A stronger definition of 3NF & on the left-hand side of the functional dependency there is a candidate key
4NF A relation will be in 4NF if it is in Boyce Codd's normal form and has no multi-valued dependency.
5NF A relation is in 5NF. If it is in 4NF and does not contain any join dependency, joining should be lossless.
THANK YOU