0% found this document useful (0 votes)
22 views

DBMS Module IV NOTES

Uploaded by

Anish Nayak
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views

DBMS Module IV NOTES

Uploaded by

Anish Nayak
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 68

Module-4

Normalization: Database Design Theory – Introduction to Normalization using Functional and


Multivalued Dependencies: Informal design guidelines for relation schema, Functional
Dependencies,
Normal Forms based on Primary Keys, Second and Third Normal Forms, Boyce-Codd Normal Form,
Multivalued Dependency and Fourth Normal Form, Join Dependencies and Fifth Normal Form.
Examples on normal forms.

Normalization Algorithms: Inference Rules, Equivalence, and Minimal Cover, Properties of Relational
Decompositions, Algorithms for Relational Database Schema Design, Nulls, Dangling tuples, and
alternate Relational Designs, Further discussion of Multivalued dependencies and 4NF, Other
dependencies and Normal Forms

Textbook 1: Ch 14.1 to -14.7, 15.1 to 15.6

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~`

Functional dependencies and normalization

Functional dependencies and normalization are important concepts in designing an effective


relational database design.

A functional dependency occurs when the value of one attribute determines the value of another
attribute.

Normalization is the process of organizing a database in a way that reduces redundancy and
dependency. It is a crucial step in designing an efficient and effective database structure.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

What are functional dependencies?

Functional dependencies in Database Management Systems (DBMS) are a set of constraints or rules
that define the relationships between attributes (columns) within a relational database table.
These dependencies specify how the values of one or more attributes uniquely determine the
values of other attributes.

For example, consider a database of employee records. The employee's ID number might be
functionally dependent on their name because the name determines the ID number. In this case, we
would say that the ID number is functionally dependent on the name.

Functional dependencies can be used to design a database in a way that eliminates redundancy and
ensures data integrity. For example, consider a database that stores employee records and the
departments they work in. If we store the department name for each employee, we might end up with
several copies of the same department name.

This would be redundant and would take up unnecessary space in the database. Instead, we can
use functional dependencies to store the department name only once and use the employee's ID
number to determine which department they work in. This reduces redundancy and makes the
database more efficient.

Without Functional dependencies ( which occupies more spaces) Equivalence, and Minimal Cove

Emp_id Emp_name Emp_Department


101 Ram Computer Science & Engineering

In the above relation Emp_Department is redundant , we have to convert it to be functionally


dependent as follows

Emp_id Emp_name Depart_id


101 Ram 1
…. …. ….

Depart_id Department_name
1 Computer Science & Engineering
2 Information Science & Engineering

Real life example of functional dependency, multivalued dependency, trivial functional dependency and
non-trivial functional dependency.

What is a functional dependency?


Functional dependency is a relationship that occurs when one attribute of relation is uniquely
determines another attribute. If ‘R’ is a relation with attributes ‘A’ and ‘B’, a functional dependency
between the attributes is represented as A→B, which specifies ‘B’ is functionally dependent on ‘A’.
where; A is a determinant set, B is a dependent attribute. Each value of ‘A’ is associated with exactly
one B value. Functional dependency in a database assists as a constraint between two sets of attributes.
Functional dependency defining is an important part of relational database design and it contributes to
aspect normalization.

Functional Dependency is when one attribute governs other attribute in a DBMS system. Functional
Dependency plays a vital role to make big difference between good and bad database design.
In above example, if we know the Employee number, we can find Employee Name, city, salary, etc.

With this fact, we can say that the city, Employee Name, and salary are functionally depended on
Employee number.

A functional dependency is symbolized by an arrow →

The functional dependency of A on B is represented by A →B

Multivalued dependency in DBMS

Multivalued dependency is the situation where there are multiple independent multivalued attributes
occurred in a single table. A multivalued dependency is a constraint between two sets of attributes in a
relation. It requires that certain tuples be present in a relation.

In above example, color and maf_year are independent of each other but dependent on car_model. In
this example, these two columns (color and maf_year) are said to be multi value dependent on
car_model.

The Trivial dependency is a set of attributes which are included in that attribute.

So, A → B is a trivial functional dependency if B is a subset of A.


In this table with two columns Emp_id and Emp_name.

{Emp_id, Emp_name} → Emp_id is a trivial functional dependency as Emp_id is a subset of


{Emp_id,Emp_name}.

Non trivial functional dependency in DBMS

Functional dependency which also known as a nontrivial dependency occurs when A→B holds true
where B is not a subset of A. In a relationship, if attribute B is not a subset of attribute A, then it is
considered as a non-trivial dependency.

Example:

(Company} → {CEO} (if we know the Company, we know name of CEO)

Here CEO is not a subset of Company, and Thence it's non-trivial functional dependency.
Types of Functional dependencies in DBMS

In a relational database management, functional dependency is a concept that specifies the relationship
between two sets of attributes where one attribute determines the value of another attribute. It is
denoted as X → Y, where the attribute set on the left side of the arrow, X is called Determinant, and Y
is called the Dependent.

Functional dependencies are used to mathematically express relations among database entities and
are very important to understand advanced concepts in Relational Database System and understanding
problems in competitive exams like Gate.

Example:

From the above table we can conclude some valid functional dependencies:

roll_no → { name, dept_name, dept_building },→ Here, roll_no can determine values of fields
name, dept_name and dept_building, hence a valid Functional dependency
roll_no → dept_name , Since, roll_no can determine whole set of {name, dept_name, dept_building},
it can determine its subset dept_name also.
dept_name → dept_building , Dept_name can identify the dept_building accurately, since
departments with different dept_name will also have a different dept_building
More valid functional dependencies:
roll_no → name, {roll_no, name} ⇢ {dept_name, dept_building}, etc.

Here are some invalid functional dependencies:

name → dept_name Students with the same name can have different dept_name, hence this is not
a valid functional dependency.

dept_building → dept_name There can be multiple departments in the same building.


Example, in the above table departments ME and EC are in the same building B2,
hence dept_building → dept_name is an invalid functional dependency.

More invalid functional dependencies:


name → roll_no,
{name, dept_name} → roll_no,
dept_building → roll_no, etc.

Armstrong’s axioms/properties of functional dependencies:

Reflexivity: If Y is a subset of X, then X→Y holds by reflexivity rule

Example, {roll_no, name} → name is valid.

Augmentation: If X → Y is a valid dependency, then XZ → YZ is also valid by the augmentation rule.

Example, {roll_no, name} → dept_building is valid, hence {roll_no, name, dept_name} →


{dept_building, dept_name} is also valid.

Transitivity: If X → Y and Y → Z are both valid dependencies, then X→Z is also valid by the
Transitivity rule.

Example, roll_no → dept_name & dept_name → dept_building, then roll_no → dept_building is


also valid.

Types of Functional Dependencies in DBMS

 Trivial functional dependency


 Non-Trivial functional dependency
 Multivalued functional dependency
 Transitive functional dependency

1. Trivial Functional Dependency

In Trivial Functional Dependency, a dependent is always a subset of the determinant. i.e. If X → Y and
Y is the subset of X, then it is called trivial functional dependency

Example:

Here, {roll_no, name} → name is a trivial functional dependency, since the dependent name is a
subset of determinant set {roll_no, name}. Similarly, roll_no → roll_no is also an example of trivial
functional dependency.
2. Non-trivial Functional Dependency
In Non-trivial functional dependency, the dependent is strictly not a subset of the determinant.
i.e. If X → Y and Y is not a subset of X, then it is called Non-trivial functional dependency.

Example:

Here, roll_no → name is a non-trivial functional dependency, since the dependent name is not a
subset of determinant roll_no. Similarly, {roll_no, name} → age is also a non-trivial functional
dependency, since age is not a subset of {roll_no, name}

3. Multivalued Functional Dependency

In Multivalued functional dependency, entities of the dependent set are not dependent on each
other. i.e. If a → {b, c} and there exists no functional dependency between b and c, then it is called a
multivalued functional dependency.

For example,

Here, roll_no → {name, age} is a multivalued functional dependency, since the dependents name &
age are not dependent on each other(i.e. name → age or age → name doesn’t exist !)
4. Transitive Functional Dependency

In transitive functional dependency, dependent is indirectly dependent on determinant. i.e. If a → b &


b → c, then according to axiom of transitivity, a → c. This is a transitive functional dependency.

For example,

Here, enrol_no → dept and dept → building_no. Hence, according to the axiom of transitivity,
enrol_no → building_no is a valid functional dependency. This is an indirect functional dependency,
hence called Transitive functional dependency.

5. Fully Functional Dependency

In full functional dependency an attribute or a set of attributes uniquely determines another


attribute or set of attributes. If a relation R has attributes X, Y, Z with the dependencies X->Y and X-
>Z which states that those dependencies are fully functional.

6. Partial Functional Dependency

In partial functional dependency a non key attribute depends on a part of the composite key, rather than
the whole key. If a relation R has attributes X, Y, Z where X and Y are the composite key and Z is
non key attribute. Then X->Z is a partial functional dependency in RBDMS.

For example , from the above table , if there is an attribute course_no , then enrol_no - >
course_no is a partial functional dependancy
Advantages of Functional Dependencies

Functional dependencies having numerous applications in the field of database management system.
Here are some applications listed below:

1. Data Normalization
Data normalization is the process of organizing data in a database in order to minimize
redundancy and increase data integrity. Functional dependencies play an important part in data
normalization. With the help of functional dependencies we are able to identify the primary key,
candidate key in a table which in turns helps in normalization.

2. Query Optimization

With the help of functional dependencies we are able to decide the connectivity between the tables
and the necessary attributes need to be projected to retrieve the required data from the tables.
This helps in query optimization and improves performance.

3. Consistency of Data

Functional dependencies ensures the consistency of the data by removing any redundancies or
inconsistencies that may exist in the data. Functional dependency ensures that the changes made
in one attribute does not affect inconsistency in another set of attributes thus it maintains the
consistency of the data in database.

4. Data Quality Improvement

Functional dependencies ensure that the data in the database to be accurate, complete and
updated. This helps to improve the overall quality of the data, as well as it eliminates errors and
inaccuracies that might occur during data analysis and decision making, thus functional dependency
helps in improving the quality of data in database.

Conclusion

Functional dependency is very important concept in database management system for ensuring the
data consistency and accuracy.
What is Normalization?

 Normalization is the process of organizing the data in the database.


 Normalization is used to minimize the redundancy from a relation or set of relations. It is also
used to eliminate undesirable characteristics like Insertion, Update, and Deletion
Anomalies.( or enhance data integrity and consistency)
 Normalization divides the larger table into smaller and links them using relationships.
 The normal form is used to reduce redundancy from the database table.

Why do we need Normalization?

The main reason for normalizing the relations is removing these anomalies. Failure to eliminate
anomalies leads to data redundancy and can cause lack of data integrity and other problems as the
database grows. Normalization consists of a series of guidelines that helps to guide you in creating a
good database structure.

Data modification anomalies can be categorized into three types:

Insertion Anomaly: Insertion Anomaly refers to when one cannot insert a new tuple into a relationship
due to lack of data.
Deletion Anomaly: The delete anomaly refers to the situation where the deletion of data results in the
unintended loss of some other important data.
Updatation Anomaly: The update anomaly is when an update of a single data value requires multiple
rows of data to be updated.

Why is Normalization Important?


Normalization is the process of organizing a database to reduce redundancy and dependency. It is
important because it helps to eliminate data inconsistencies and ensures that the data is stored in a
logical and organized way.

For example, consider a database that stores customer information and the products they have
purchased. If we store the product names with each customer record, we might end up with
several copies of the same product name. This would be redundant and would take up
unnecessary space in the database. Instead, we can use normalization to create a separate table
for products and store the product names only once. This reduces redundancy and makes the
database more efficient.

There are several normal forms that can be used to normalize a database. The most common normal
forms are the first, second, and third normal forms.

First normal form (1NF)

The first normal form (1NF) is a basic level of normalization. To be in 1NF, a table must meet the
following criteria −
It must contain only atomic values. An atomic value is a single value that cannot be further broken
down. For example, a name is an atomic value, but an address is not because it can be broken down
into separate values for the street, city, state, and zip code.

It must not contain repeating groups. A repeating group is a set of values that are repeated within a
single record. For example, if a table contains a field for phone numbers, it should not contain
multiple phone numbers within the same field. Instead, there should be separate fields for each
phone number.

Second normal form (2NF)

The second normal form (2NF) is a higher level of normalization. To be in 2NF, a table must meet the
following criteria −

It must be in 1NF.

It must not have any partial dependencies. A partial dependency occurs when a non-key attribute is
dependent on only a part of the primary key. For example, consider a table with the following
attributes: EmployeeID (primary key), EmployeeName, and DepartmentID.

If the DepartmentID is dependent on the EmployeeID, but not on the EmployeeName, there is a
partial dependency. To eliminate this dependency, we would create a separate table for departments
and store the DepartmentID and DepartmentName in that table.

Third normal form (3NF)

The third normal form (3NF) is a higher level of normalization. To be in 3NF, a table must meet the
following criteria:

It must be in 2NF.

It must not have any transitive dependencies. A transitive dependency occurs when an attribute is
dependent on another attribute that is not the primary key. For example, consider a table with the
following attributes: EmployeeID (primary key), EmployeeName, and ManagerID. If the ManagerID is
dependent on the EmployeeID, which is the primary key, there is no transitive dependency. However, if
the ManagerID is dependent on the EmployeeName, which is not the primary key, there is a
transitive dependency. To eliminate this dependency, we would create a separate table for managers and
store the ManagerID and ManagerName in that table.
Types of Normal Forms:

Normalization works through a series of stages called Normal forms. The normal forms apply to
individual relations. The relation is said to be in particular normal form if it satisfies constraints.

Following are the various types of Normal forms with its decomposition and Conditions
Advantages of Normalization

 Normalization helps to minimize data redundancy.


 Greater overall database organization.
 Data consistency within the database.
 Much more flexible database design.
 Enforces the concept of relational integrity.
What is the Purpose of Normalization?

The main purpose of database normalization is to avoid complexities, eliminate duplicates, and
organize data in a consistent way. In normalization, the data is divided into several tables linked
together with relationships.

Database administrators are able to achieve these relationships by using primary keys, foreign keys,
and composite keys.

A primary key is a column that uniquely identifies the rows of data in that table. It’s a unique
identifier such as an employee ID, student ID, voter’s identification number (VIN), and so on.

A foreign key is a field that relates to the primary key in another table.

A composite key is just like a primary key, but instead of having a column, it has multiple
columns.

What is 1NF 2NF and 3NF?

1NF, 2NF, and 3NF are the first three types of database normalization. They stand for first normal
form, second normal form, and third normal form, respectively.

There are also 4NF (fourth normal form) and 5NF (fifth normal form). There’s even 6NF (sixth
normal form), but the commonest normal form you’ll see out there is 3NF (third normal form).

All the types of database normalization are cumulative – meaning each one builds on top of those
beneath it. So all the concepts in 1NF also carry over to 2NF, and so on.

The First Normal Form – 1NF

For a table to be in the first normal form, it must meet the following criteria:

 A single cell must not hold more than one value (atomicity)
 An each column must have only one value for each row in the table
 There must be a primary key for identification
 No duplicated rows or columns
Types Of Functional Dependencies-

There are two types of functional dependencies-

 Trivial Functional Dependencies


 Non-trivial Functional Dependencies

1. Trivial Functional Dependencies-

A functional dependency X → Y is said to be trivial if and only if Y ⊆ X.


Thus, if RHS of a functional dependency is a subset of LHS, then it is called as a trivial functional
dependency.

Examples-

The examples of trivial functional dependencies are-

AB → A
AB → B
AB → AB

2. Non-Trivial Functional Dependencies-

A functional dependency X → Y is said to be non-trivial if and only if Y ⊄ X.


Thus, if there exists at least one attribute in the RHS of a functional dependency that is not a part
of LHS, then it is called as a non-trivial functional dependency.

Examples-

The examples of non-trivial functional dependencies are-

AB → BC
AB → CD
Partial Functional Dependency

In partial functional dependency a non key attribute depends on a part of the composite key, rather
than the whole key. If a relation R has attributes X, Y, Z where X and Y are the composite key
(X,Y) and Z is non key attribute. Then X->Z is a partial functional dependency in RBDMS as Y
is NOT determinant of the Z

The Third Normal Form

When a table is in 2NF, it eliminates repeating groups and redundancy, but it does not eliminate
transitive partial dependency.

This means a non-prime attribute (an attribute that is not part of the candidate’s key) is
dependent on another non-prime attribute. This is what the third normal form (3NF) eliminates.

So, for a table to be in 3NF, it must:

 It must be in 2NF
 It has no transitive partial dependency.

Non technical definition of normal forms 1NF, 2NF and 3NF

1NF: No repeating elements or groups of elements


2NF: All Non-key Attributes are Dependent on All of Key(composite key) [ No Partial
dependency]
3NF: No dependencies on non-key attributes [ No transitive dependency]

Examples of 1NF, 2NF, and 3NF

Database normalization is quite technical, but we will illustrate each of the normal forms with
examples.

Imagine we're building a restaurant management application. That application needs to store data
about the company's employees and it starts out by creating the following table of employees:
All the entries are atomic and there is no repeating groups. so the table is in the first normal form
(1NF)

In the above table, there are two determinants Employee_ID and Job_code , both form a composite
key here. ( if you know employee_ID then name,state_code,home_state will be determined and if
you know Job_code then Job will be determined) and there is a composite primary key
(employee_id, job_code) for this table.

Partial Functional Dependency

In partial functional dependency a non key attribute depends on a part of the composite key, rather
than the whole key. If a relation R has attributes X, Y, Z where X and Y are the composite key
(X,Y) and Z is non key attribute. Then X->Z is a partial functional dependency in RBDMS as Y
is NOT determinant of the Z

Similarly, here composite primary key (employee_id, job_code) - > name or state_code,
home_state are only partial dependent on the primary key employee_id but not on job_code.

So, the table is not in 2NF. We should decompose them to a different table to make it 2NF.
Example of Second Normal Form (2NF)

employee_roles Table

employees Table

jobs table
Now it is in 2 NF as there is no partial dependency.

For further improvement in normalization, if you carefully look into employee table,

You can observe that Employee_ID determines the State_code and State_code determines the
home_state which is obviously a transitive dependency.
i.e
(Employee_ID - > State_code ) and (State_code - > home_state )

then it will become

(Employee_ID - > home_state ) ( it is a transitive dependency ). Therefore it is not 3 NF compatible.

To take this a step further, we should decompose the table again to a different table to make it 3NF.

Example of Third Normal Form (3NF)

employee table

states table
employee_roles table

jobs table

As it is in 3 NF as there is transitive dependency, now the Restaurant employee database is in


3NF.
Boyce Codd normal form (BCNF)

Boyce-Codd Normal Form or BCNF is an extension to the third normal form, and is also known as
3.5 Normal Form. It is stricter than 3 NF ( which must not be transitive dependency)

Rules for BCNF


For a table to satisfy the Boyce-Codd Normal Form, it should satisfy the following two conditions:

It should be in the Third Normal Form.


A table is in BCNF if every functional dependency X → Y, X is the super key of the table.
(it means, that for a dependency A → B, A cannot be a non-prime attribute, if B is a prime
attribute)

Therefore, For BCNF, the table should be in 3NF, and for every FD, LHS is super key.

Example
Below we have a college enrolment table with columns student_id, subject and professor.

As you can see, we have also added some sample data to the table.

In the table above:

One student can enroll for multiple subjects. For example, student with student_id 101, has opted for
subjects - Java & C++

For each subject, a professor is assigned to the student.

And, there can be multiple professors teaching one subject like we have for Java.

Well, in the table above (student_id, subject ) together form the primary key, because using
student_id and subject, we can find all the columns of the table.

This table satisfies the 1st Normal form because all the values are atomic, column names are unique
and all the values stored in a particular column are of same domain.
This table also satisfies the 2nd Normal Form as their is no Partial Dependency.

And, there is no Transitive Dependency, hence the table also satisfies the 3rd Normal Form.

But this table is not in Boyce-Codd Normal Form.

Why this table is not in BCNF?

In the table above, (student_id, subject ) form primary key, which means subject column is a prime
attribute.

But, there is one more dependency, professor → subject.

And while subject is a prime attribute, professor is a non-prime attribute, which is not allowed by
BCNF.

How to satisfy BCNF?

To make this relation(table) satisfy BCNF, we will decompose this table into two tables, student table
and professor table.

Below we have the structure for both the tables.

Student Table

And, Professor Table

And now, this relation satisfy Boyce-Codd Normal Form. Next we will learn about the Fourth
Normal Form.
Fourth normal form (4NF)

A relation will be in 4NF if it is in Boyce Codd normal form and has no multi-valued dependency.

For a dependency A → B, if for a single value of A, multiple values of B exists, then the relation
will be a multi-valued dependency.

Example
STUDENT

The given STUDENT table is in 3NF, but the COURSE and HOBBY are two independent entity.
Hence, there is no relationship between COURSE and HOBBY.( I.e no transitive dependency)

In the STUDENT relation, a student with STU_ID, 21 contains two courses, Computer and Math and
two hobbies, Dancing and Singing. So there is a Multi-valued dependency on STU_ID, which leads
to unnecessary repetition of data.

So to make the above table into 4NF, we can decompose it into two tables:
STUDENT_COURSE
STUDENT_HOBBY

Fifth normal form (5NF)

A relation is in 5NF if it is in 4NF and not contains any join dependency and joining should be
lossless.
5NF is satisfied when all the tables are broken into as many tables as possible in order to avoid
redundancy.
5NF is also known as Project-join normal form (PJ/NF).

Join Dependency

Join dependency or JD is a constraint that is similar to FD (functional dependency) or MVD


(multivalued dependency). JD is satisfied only when the concerned relation is a join of a specific
number of projections. Thus, such a type of constraint is known as a join dependency.

What is Join Dependency in DBMS?


Whenever we can recreate a table by simply joining various tables where each of these tables
consists of a subset of the table’s attribute, then this table is known as a Join Dependency. Thus, it
is like a generalization of MVD. We can relate the JD to 5NF. Herein, a relation can be in 5NF only
when it’s already in the 4NF. Remember that it cannot be further decomposed.

An example for Join dependency

The relation X would satisfy join dependency whenever X is equal to the join of X1, X2, ….. Xn,
where Xi happens to be a subset of a set of attributes of X.

Relation X
Thus, the relation given above says that sec offers many elective languages that are taken by a
combination of their students. These students have their individual opinion to choose their
languages. Thus, all three fields are required to represent this data and information.

This relation does not display non-trivial MVDs. It is because the attributes, language and name, are
dependent. Thus, these are related to one another (A Functional Dependency subject -> the existing
name). This relation cannot be decomposed into two relations (sec, language) and (sec, name).

Thus, we cannot decompose this relation into the following relations:

X1(sec, language)

X2(sec, name) and

X3(language, name)

X1
X2

X3

Relational Decomposition
When a relation in the relational model is not in appropriate normal form then the decomposition of a
relation is required.

In a database, it breaks the table into multiple tables.

If the relation has no proper decomposition, then it may lead to problems like loss of information.

Decomposition is used to eliminate some of the problems of bad design like anomalies, inconsistencies,
and redundancy.
Properties of Decomposition

Lossless Decomposition
 If the information is not lost from the relation that is decomposed, then the decomposition will
be lossless.

 The lossless decomposition guarantees that the join of relations will result in the same relation
as it was decomposed.

 The relation is said to be lossless decomposition if natural joins of all the decomposition give
the original relation.
Example:
EMPLOYEE_DEPARTMENT table:

EMP_ID EMP_NAME EMP_AGE EMP_CITY DEPT_ID DEPT_NAME


22 Denim 28 Mumbai 827 Sales

33 Alina 25 Delhi 438 Marketing

46 Stephan 30 Bangalore 869 Finance

52 Katherine 36 Mumbai 575 Production


60 Jack 40 Noida 678 Testing

The above relation is decomposed into two relations EMPLOYEE and DEPARTMENT
EMPLOYEE table:

EMP_ID EMP_NAME EMP_AGE EMP_CITY


22 Denim 28 Mumbai

33 Alina 25 Delhi

46 Stephan 30 Bangalore

52 Katherine 36 Mumbai

60 Jack 40 Noida

DEPARTMENT table
DEPT_ID EMP_ID DEPT_NAME
827 22 Sales

438 33 Marketing

869 46 Finance

575 52 Production

678 60 Testing

Now, when these two relations are joined on the common column "EMP_ID", then the resultant
relation will look like:

Employee ⋈ Department
EMP_ID EMP_NAME EMP_AGE EMP_CITY DEPT_ID DEPT_NAME
22 Denim 28 Mumbai 827 Sales

33 Alina 25 Delhi 438 Marketing

46 Stephan 30 Bangalore 869 Finance

52 Katherine 36 Mumbai 575 Production

60 Jack 40 Noida 678 Testing

Hence, the decomposition is Lossless join decomposition.

Dependency Preserving
 It is an important constraint of the database.

 If a relation R is decomposed into relation R1 and R2, then the dependencies of R either must
be a part of R1 or R2 or must be derivable from the combination of functional dependencies of
R1 and R2.

 The above said decomposition of R into R1 and R2 is a dependency preserving decomposition


if (F1 U F2)+ = F+, where F1 is set of FDs hold by R1, F2 is set of FDs hold by R2, and F is the
set of FDs hold by R. (F1 U F2)+ is the closure of (F1 U F2), and F+ is the closure of F.

Check the following Relation R(A,B,C,D) with {A→ B, C→ D} AND Sub-relations R1(A,B) and
R2(C,D) are lossless and dependency preserving deposition

If R1n R2 != NULL, it is lossless

R1 n R2 = {C} Therefore, it is lossless deposition

To check dependency preserving the following conditions have to be checked


The above said decomposition of R into R1 and R2 is a dependency preserving decomposition if (F1 U
F2)+ = F, where F1 is set of FDs hold by R1, F2 is set of FDs hold by R2, and F is the set of FDs hold
by R. (F1 U F2)+ is the closure of (F1 U F2), and F+ is the closure of F.

Dependency set of R = {A→ B, C→ D}

R1(A,B) and R2(C,D)

Find the Closure for R1

(A,B)

A+ = AB , A→ B

B+= B nothing

Find the Closure for R2

(C,D)

C+ = CD , C→ D

D+= D , nothing

(F1 U F2 )+ = { A→ B, C→ D}

(F1U F2)+ = F+ Therefore the decomposition is dependency preserving also

Inference Rule (IR) V A

The Armstrong's axioms are the basic inference rule.


Armstrong's axioms are used to conclude functional dependencies on a relational database.
The inference rule is a type of assertion. It can apply to a set of FD(functional dependency) to
derive other FD.
Using the inference rule, we can derive additional functional dependency from the initial set.
The Functional dependency has 6 types of inference rule:

1. Reflexive Rule (IR1)

In the reflexive rule, if Y is a subset of X, then X determines Y.( Vice versa is also true)

If X ⊇ Y then X → Y
Example:

X = {a, b, c, d, e}
Y = {a, b, c}
2. Augmentation Rule (IR2)

The augmentation is also called as a partial dependency. In augmentation, if X determines Y, then XZ


determines YZ for any Z.

If X → Y then XZ → YZ

Example:

For R(ABCD), if A → B then AC → BC

3. Transitive Rule (IR3)

In the transitive rule, if X determines Y and Y determine Z, then X must also determine Z.

If X → Y and Y → Z then X → Z

4. Union Rule (IR4)

Union rule says, if X determines Y and X determines Z, then X must also determine Y and Z.

If X → Y and X → Z then X → YZ

Proof:

1. X → Y (given)
2. X → Z (given)
3. X → XY (using IR2 on 1 by augmentation with X. Where XX = X)
4. XY → YZ (using IR2 on 2 by augmentation with Y)
5. X → YZ (using IR3 on 3 and 4)

5. Decomposition Rule (IR5)


Decomposition rule is also known as project rule. It is the reverse of union rule.

This Rule says, if X determines Y and Z, then X determines Y and X determines Z separately.

If X → YZ then X → Y and X → Z

Proof:

1. X → YZ (given)
2. YZ → Y (using IR1 Rule, according to subset rule)
3. X → Y (using IR3 on 1 and 2)

6. Pseudo transitive Rule (IR6)


In Pseudo transitive Rule, if X determines Y and YZ determines W, then XZ determines W.

If X → Y and YZ → W then XZ → W
Proof:

1. X → Y (given)
2. WY → Z (given)
3. WX → WY (using IR2 on 1 by augmenting with W)
4. WX → Z (using IR3 on 3 and 2)

Informal Design Guidelines for Relation Schemas

Four informal guidelines that may be used as measures to determine the quality of relation schema
design:

 Making sure that the semantics of the attributes is clear in the schema
 Reducing the redundant information in tuples
 Reducing the NULL values in tuples
 Disallowing the possibility of generating spurious tuples

Guideline 1: Design a relation schema so that it is easy to explain its meaning.


Do not combine attributes from multiple entity types and relationship types into a single relation

Examples of Violating Guideline 1. The relation schemas in Figures (a) and (b) also have clear
semantics.

They violate Guideline 1 by mixing attributes from distinct real-world entities: EMP_DEPT mixes
attributes of employees and departments, and EMP_PROJ mixes attributes of employees and
projects and the WORKS_ON relationship. Hence, they fare poorly against the above measure of
design quality.

Solution : Decomposing the relation by entity wise


Figure below shows a simplified version of the COMPANY relational database schema. It has a good
design principle for designing a relational scheme with clear semantic attributes

Guideline 2: Design the base relation schemas so that no insertion, deletion, or modification
anomalies are present in the relations.
If any anomalies are present, note them clearly and make sure that the programs that update the
database will operate correctly.

Figure below shows a simplified version of the COMPANY relational database schema
It is difficult to insert a new department that has no employees as yet in the EMP_DEPT relation.
The only way to do this is to place NULL values in the attributes for employee.
Insertion Anomalies: Insertion anomalies can be differentiated into two types, illustrated by the
following examples based on the EMP_DEPT relation:

To insert a new employee tuple into EMP_DEPT, we must include either the attribute values for the
department that the employee works for, or NULLs (if the employee does not work for a department as
yet). For example, to insert a new tuple for an employee who works in department number 5, we must
enter all the attribute values of department 5 correctly so that they are consistent with the corresponding
values for department 5 in other tuples in EMP_DEPT.

It is difficult to insert a new department that has no employees as yet in the EMP_DEPT relation.
The only way to do this is to place NULL values in the relation

Deletion Anomalies: The problem of deletion anomalies is related to the second insertion anomaly
situation just discussed. If we delete from EMP_DEPT an employee tuple that happens to
represent the last employee working for a particular department, the information concerning
that department is lost from the database.

Modification Anomalies: In EMP_DEPT, if we change the value of one of the attributes of a


particular department—say, the manager of department 5—we must update the tuples of all
employees who work in that department; otherwise, the database will become inconsistent. If we
fail to update some tuples, the same department will be shown to have two different values for manager
in different employee tuples, which would be wrong.

Guide line 3 : Avoid placing attributes in a base relation whose values may frequently be NULL.
If NULLs are unavoidable, make sure that they apply in exceptional cases only and do not apply to
a majority of tuples in the relation.

Guideline 4 : Design relation schemas so that they can be joined with equality conditions on
attributes
Design relation schemas so that they can be joined with equality conditions on attributes that are
appropriately related (primary key, foreign key) pairs in a way that guarantees that no spurious
tuples are generated. Avoid relations that contain matching attributes that are not (foreign key,
primary key) combinations because joining on such attributes may produce spurious tuples.
Example of Spurious tuples :

Spurious Tuples: Spurious Tuples are those rows in a table, which occur as a result of joining two
tables in the wrong manner. They are extra tuples (rows) that might not be required.

If a relation is denoted by R, and its decomposed relations are denoted by R1, R2, R3…Rn, then, the
condition for not getting any Spurious Tuple is denoted by,

R1 ⨝ R2 ⨝ R3 .... ⨝ Rn = R

Whereas the condition for getting Spurious Tuples is denoted by,

R ⊂ R1 ⨝ R2 ⨝ R3 .... ⨝ Rn

Example-1: Example to check if the given relation contains Spurious Tuples. Let R be a Relation, and
R1 and R2 be relations that we get after decomposing R.

After performing the join operation of relations R1 and R2 (R1 ⨝ R2), we get back the original relation
R.
Example - 1 : Joining two tables which causes no spurious tuples

The condition for no spurious tuples, R1 ⨝ R2 = R, is met. Hence, we do not get any Spurious Tuples.

Conclusion – No Spurious Tuple exists.

Example-2: Example to check if the given relation contains Spurious Tuples. Let R be a Relation,
and R1 and R2 be relations that we get after decomposing R.
After performing the join operation of relations R1 and R2 (R1 ⨝ R2), we do not get back the
original relation R.

The condition for spurious tuples, R ⊂ R1 ⨝ R2, is met. Hence, we get Spurious Tuples. Conclusion –
Spurious Tuples exist. Note – Rows in DBMS are called tuples. Whereas columns in DBMS are called
attributes. Spurious Tuples can be remembered as extra rows in the table. The natural join leading to
Spurious Tuples is called Lossy Join. The natural join not resulting in Spurious Tuples is called
Lossless Join.

What is Lossless Decomposition?

Lossless join decomposition is a decomposition of a relation R into relations R1, and R2 such that
if we perform a natural join of relation R1 and R2, it will return the original relation R. This is
effective in removing redundancy from databases while preserving the original data.

In other words by lossless decomposition, it becomes feasible to reconstruct the relation R from
decomposed tables R1 and R2 by using Joins.

Only 1NF,2NF,3NF, and BCNF are valid for lossless join decomposition.

In Lossless Decomposition, we select the common attribute and the criteria for selecting a common
attribute is that the common attribute must be a candidate key or super key in either relation R1, R2, or
both.

Example of Lossless Decomposition


— Employee (Employee_Id, Ename, Salary, Department_Id, Dname)

Can be decomposed using lossless decomposition as,

— Employee_desc (Employee_Id, Ename, Salary, Department_Id)


— Department_desc (Department_Id, Dname)
Alternatively the lossy decomposition would be as joining these tables is not possible so not possible to
get back original data.
– Employee_desc (Employee_Id, Ename, Salary)
– Department_desc (Department_Id, Dname)

Functional Dependency Exercises

Equivalence, and
Minimal Cove
Functional dependency using closure
Functional Dependency In DBMS : Introduction
Functional Dependency in DBMS also known as “FDs” means a relationship. A relationship which
only exists when an attribute can determine other attribute functionally.
The first attribute does not compute or calculate the value of second attribute but searches value of the
tuple corresponding to the first attribute and fetches that value of the attribute.
Functional Dependency in DBMS is denoted using an arrow between two or more attributes such as :
FD : A -> B
Here, A & B are the attributes present in any relation.
“A->B” means, “B” is functionally dependent upon “A” or “A” functionally determines “B”.
Functional dependency acts as a constraint between set of attributes present in any database.
Functional Dependency In DBMS : Examples
Functional Dependency in DBMS and Keys are the most important concepts that are used as a
foundation in database normalization. We will try to explain you with example how actually
functional dependency works.
Example-1 : Consider a table student_details containing details of some students.

Example : student_details Table

We can conclude from Roll_No attribute in the table, we are able to determine the Name of student
uniquely and same is the case with marks too. Hence, we can say that Name and Marks are
functionally dependent on Roll_No but the vice versa is not true.

FD1 : Roll_No -> Name


FD2 : Roll_No -> Marks
NOTE : In the above scenario, Two Roll_No can have same Name(i.e. Anoop in the above table as
Roll_No-1 and Roll_No-5 but two same Name cannot have same Roll_No. This is how Functional
dependency in DBMS concept works.

Example-2 : Consider the table student_details containing details of some students.

Example : student_details Table ( This is wrong to have two records having same rollno )
Here, Name is not functionally dependent upon Roll_No as whenever we will try to search the value
of Name against Roll_No attribute, two different names will be provided which is practically not
possible.
Hence, Functional Dependency in DBMS concept exists when an attribute is able to uniquely
determine another attribute.
Functional Dependency In DBMS : Armstrong’s Axioms
Axioms in database management systems was introduced by William W. Armstrong in late 90’s and
these axioms play a vital role while implementing the concept of functional dependency in DBMS for
database normalization. There exists six inferences known a s “Armstrong’s Axioms” which are
discussed below.
Reflexive : It means, if set “B” is a subset of “A”, then A -> B (IR1).
Augmentation : It means, if A -> B, then AC -> BC(IR2).
Transitive : It means, if A -> B & B-> C, then A-> C(IR3).
Union : It means, if A->B & A->C, then A->BCIR4).
Decomposition : It means, if A->BC, then A->B & A->CIR5).
Pseudo-Transitivity : It means, if A->B and DB->C, then DA→C (IR6).

Attribute Closure
Closure Of Functional Dependency : Introduction
The Closure Of Functional Dependency means the complete set of all possible attributes that can
be functionally derived from given functional dependency using the inference rules known as
Armstrong’s Rules.

If “F” is a functional dependency then closure of functional dependency


can be denoted using “{F}+”.

There are three steps to calculate closure of functional dependency. These are:

Step-1 : Add the attributes which are present on Left Hand Side in the original functional
dependency.

Step-2 : Now, add the attributes present on the Right Hand Side of the functional dependency.

Step-3 : With the help of attributes present on Right Hand Side, check the other attributes that
can be derived from the other given functional dependencies. Repeat this process until all the
possible attributes which can be derived are added in the closure.

Closure Of Functional Dependency : Examples

Example-1 : Consider the table student_details having (Roll_No, Name,Marks, Location) as the
attributes and having two functional dependencies.

FD1 : Roll_No -> Name, Marks

FD2 : Name -> Marks, Location

Now, We will calculate the closure of all the attributes present in the relation using the three steps
mentioned below.

Step-1 : Add attributes present on the LHS of the first functional dependency to the closure.

{Roll_no}+ = {Roll_No}

Step-2 : Add attributes present on the RHS of the original functional dependency to the
closure( using Union operation)

{Roll_no}+ = {Roll_No, Name, Marks}

Step-3 : Add the other possible attributes which can be derived using attributes present on the
RHS of the closure.

Therefore, complete closure of Roll_No will be of all attributes which are in RHS.

{Roll_no}+ = {Roll_No, Name, Marks, Location}


Similarly, we can calculate closure for other attributes too i.e “Name”.

Step-1 : Add attributes present on the LHS of the functional dependency to the closure.

{Name}+ = {Name}

Step-2 : Add the attributes present on the RHS of the functional dependency to the closure.

{Name}+ = {Name, Marks, Location}{Roll_no}+ = {Roll_No, Name, Marks}

Step-3 : Since, we don’t have any functional dependency where “Marks or Location” attribute is
functionally determining any other attribute, we cannot add more attributes to the closure.
Hence complete closure of Name would be :

{Name}+ = {Name, Marks, Location}

NOTE : We don’t have any Functional dependency where marks and location can functionally
determine any attribute. Hence, for those attributes we can only add the attributes themselves in their
closures. Therefore,

{Marks}+ = {Marks}

and

{Location}+ = { Location}

Example-2 : Consider a relation R(A,B,C,D,E) having below mentioned functional dependencies.

FD1 : A-> B C

FD2 : C -> B

FD3 : D -> E

FD4 : E-> D

Now, we need to calculate the closure of attributes of the relation R. The closures will be:

{A}+ = {A, B, C}

{B}+ = {B}
{C}+ = {B, C}

{D}+ = {D, E}

{E}+ = {E,D}

Closure Of Functional Dependency : Calculating Candidate Key

“A Candidate Key of a relation is an attribute or set of attributes that can determine the whole
relation or contains all the attributes in its closure."

Let’s try to understand how to calculate candidate keys.

Example-1 : Consider the relation R(A,B,C) with given functional dependencies :

FD1 : A-> B

FD2 : B-> C

Now, calculating the closure of the attributes as :

{A}+ = {A, B, C}

{B}+ = {B, C}

{C}+ = {C}

Clearly, “A” is the candidate key as, its closure contains all the attributes present in the relation
“R”.

Example-2 : Consider another relation R(A, B, C, D, E) having the Functional dependencies :

FD1 : A-> BC

FD2 : C-> B

FD3 : D-> E

FD4 : E-> D

Now, calculating the closure of the attributes as :

{A}+ = {A, B, C}

{B}+ = {B}
{C}+ = {C, B}

{D}+ = {E, D}

{E}+ = {E, D}

In this case, a single attribute is unable to determine all the attribute on its own like in previous
example. Here, we need to combine two or more attributes to determine the candidate keys.

{A, D}+ = {A, B, C, D, E}

{A, E}+ = {A, B, C, D, E}

Hence, "AD" and "AE" are the two possible keys of the given relation “R”. Any other combination
other than these two would have acted as extraneous attributes.

NOTE : Any relation “R” can have either single or multiple candidate keys.

Closure Of Functional Dependency :

Key Definitions

Prime Attributes : Attributes which are indispensable part of candidate keys. For example : “A, D,
E” attributes are prime attributes in above example-2.

Non-Prime Attributes : Attributes other than prime attributes which does not take part in
formation of candidate keys.
For example.
Extraneous Attributes : Attributes which does not make any effect on removal from candidate key.

For example : Consider the relation R(A, B, C, D) with functional dependencies :

FD1 : A-> BC
FD2 : B-> C
FD3 : D-> C

Closures of LHS

{A}+={A,B,C}
{B}+={B,C}
{D}+={D,C}

Here, Candidate key can be “AD” only. Hence,

Prime Attributes : A, D.

Non-Prime Attributes : B, C
Extraneous Attributes : B, C (As if we add any of the to the candidate key, it will remain unaffected).
Those attributes, which if removed does not affect closure of that set.

Another example:
Suppose a relational schema R(w x y z), and set of functional dependency as follows

F : { wx->yz,
y->w,
z->x }

Find the candidate keys in above relation.

Solution:
{w}+ = {w}
{x} + = {x}
{y} + = {y w}
{z} + = {z x}

Since we don't have all attributes of the relation, Let us try all other combinations

{wx} + = {w x y z}
{wy} + = {w y}
{wz} + = {w z x y}

{xy} + = {x y z w}
{xz}+ = {x z}
{yz }+ = {y z w x}

Therefore the candidate keys are wx,wz,xy & yz

Another example

Suppose a relational schema R(a, b, c, d, e), and set of functional dependency as follows
F : { ab ->cd,
d ->a,
bc->de }

Find the candidate keys in above relation.


{ab}+={a,b,c,d,e)
{d}+={d,a}
{bc}+={b,c,d,a,e}
{bd}+={b,d,a,c,e}
{ad}+={a,d}

Therefore ab,bc & bd are candidates as they contain all attributes of the relation

Equivalence, and Minimal Cover, Equivalence, and Minimal Cover, Properties of Relational
Decompositions, Algorithms for Relational Database Schema Design, Nulls, Dangling tuples, and
alternate Relational Designs, Further discussion of Multivalued dependencies and 4NF, Other
dependencies and Normal Forms

Properties of Relational Decomposition

When a relation in the relational model is not appropriate normal form then the decomposition of a
relation is required. In a database, breaking down the table into multiple tables termed as
decomposition. The properties of a relational decomposition are listed below :

Attribute Preservation:
Using functional dependencies the algorithms decompose the universal relation schema R in a set
of relation schemas D = { R1, R2, ….. Rn } relational database schema, where ‘D’ is called the
Decomposition of R.
The attributes in R will appear in at least one relation schema Ri in the decomposition, i.e., no
attribute is lost. This is called the Attribute Preservation condition of decomposition.

Dependency Preservation:

If each functional dependency X->Y specified in F appears directly in one of the relation schemas
Ri in the decomposition D or could be inferred from the dependencies that appear in some Ri.
This is the Dependency Preservation.

If a decomposition is not dependency preserving some dependency is lost in decomposition. To check


this condition, take the JOIN of 2 or more relations in the decomposition.

For example:

R = (A, B, C)
F = {A ->B, B->C}
Key = {A}

Decomposition R1 = (A, B), R2 = (B, C)

R1 and R2 are in BCNF, Lossless-join decomposition, Dependency preserving.


It is sufficient that the union of the dependencies on all the relations Ri be equivalent to the
dependencies on R. ( i.e R=R1UR2)

No redundancy:

Decomposition is used to eliminate some of the problems of bad design like anomalies,
inconsistencies, and redundancy.If the relation has no proper decomposition, then it may lead to
problems like loss of information.

Lossless Join:

Lossless join property is a feature of decomposition supported by normalization. It is the ability to


ensure that any instance of the original relation can be identified from corresponding instances in
the smaller relations.

For example:
R : relation, F : set of functional dependencies on R,
X, Y : decomposition of R,
A decomposition {R1, R2, …, Rn} of a relation R is called a lossless decomposition for R if the natural
join of R1, R2, …, Rn produces exactly the relation R.

Difference between Lossless and Lossy Join Decomposition

The process of breaking up of a relation into smaller subrelations is called Decomposition.


Decomposition is required in DBMS to convert a relation into a specific normal form which further
reduces redundancy, anomalies, and inconsistency in the relation.
There are mainly two types of decompositions in DBMS-

 Lossless Decomposition
 Lossy Decomposition

Difference Between Lossless and Lossy Join Decomposition :

Lossless Vs Lossy
Lossless Lossy
The decompositions R1, R2, R2…Rn for a
The decompositions R1, R2, R2…Rn for a
relation schema R are said to be Lossy if there
relation schema R are said to be Lossless if there
natural join results into addition of extraneous
natural join results the original relation R.
tuples with the original relation R.
Formally, Let R be a relation and R1, R2, R3 … Formally, Let R be a relation and R1, R2, R3 …
Rn be it’s decomposition, the decomposition is Rn be its decomposition, the decomposition is
lossless if – lossy if –
R1 ⨝ R2 ⨝ R3 .... ⨝ Rn = R R ⊂ R1 ⨝ R2 ⨝ R3 .... ⨝ Rn
There is no loss of information as the relation There is loss of information as extraneous tuples
obtained after natural join of decompositions is are added into the relation after natural join of
equivalent to original relation. Thus, it is also decompositions. Thus, it is also referred to as
referred to as non-additive join decomposition careless decomposition.

Let there be a relational schema Student(Roll No., S_name, S_dept). StudentDetails(Roll No., S_name)
and Dept(Roll No., S_dept) be it’s decompositions.

Decompose

Roll No. S_name S_dept


1 Raju CSE
2 Raju Quantum Computing

into

Roll No. S_name


1 Raju
2 Raju

&
Roll No. S_dept
1 CSE
2 Quantum Computing

Now for the decomposition to be lossless,

StudentDetails ⨝ Dept = Student then, StudentDetails ⨝ Dept is

Roll No. S_name S_dept


1 Raju CSE
2 Raju Quantum Computing

As, StudentDetails ⨝ Dept = Student,


This decomposition is Lossless.

Example-2:
Example to check whether given Decomposition Lossy Join Decomposition.

Let there be a relational schema Student(Roll No., S_name, S_dept). StudentDetails(Roll No., S_name)
and Dept(S_name, S_dept) be it’s decompositions.

Decompose

Roll No. S_name S_dept


1 Raju CSE
2 Raju Quantum Computing

into

Roll No. S_name


1 Raju
2 Raju

&

S_name S_dept
Raju CSE
Raju Quantum Computing

Now for the decomposition to be lossy,


Student ⊂ StudentDetails ⨝ Dept then, StudentDetails ⨝ Dept is

Roll No. S_name S_dept


1 Raju CSE
1 Raju Quantum Computing
2 Raju CSE
2 Raju Quantum Computing

As, Student ⊂ StudentDetails ⨝ Dept,


This decomposition is Lossy.
Thus, we can figure out whether decomposition is lossless or lossy.

What is NULL ?

In Structured Query Language Null Or NULL is a special type of marker which is used to tell us about
that a data value does not present in the database. In Structured Query Language (SQL) Null is a
predefined word which is used to identity this marker. It is very important to understand that a NULL
value is totally different than a zero value.

In other words we can say that a NULL attribute value is equivalent of nothing that means in database
there is an attribute that has a value which indicates nothing or Null, An attributes does not exist or we
can say that it is missing . In database a Null value in tables is that value in the fields that appears to be
blank. It is a field that has no value.

An Example to illustrate testing for NULL in SQL :


Suppose there is a table named as CUSTOMERS that having records as given below.

ID NAME AGE ADDRESS SALARY

1 RAJESH 45 INDORE 48000.00

2 ANURAG 40 UJJAIN 57000.00

3 MAYANK 38 BHOPAL 45000.00

4 GAURAV 23 PUNE 35000.00


5 DEEPAK 29 MUMBAI 28000.00

6 NAMAN 25 NOIDA NULL

7 AYUSH 33 GWALIOR NULL

Now we can use IS NOT NULL operator and write a query which is as following.

SQL> SELECT *
FROM CUSTOMERS
WHERE SALARY IS NOT NULL;

After execution this query would produce the following result-

ID NAME AGE ADDRESS SALARY

1 RAJESH 45 INDORE 48000.00

2 ANURAG 40 UJJAIN 57000.00

3 MAYANK 38 BHOPAL 45000.00

4 GAURAV 23 PUNE 35000.00

5 DEEPAK 29 MUMBAI 28000.00

Here we can see that in CUSTOMERS table , ID no. 6 and 7 which is named as NAMAN and AYUSH
and their salary column is empty and in other words it is Null . That’s why after query execution it
would produce a table where these two names NAMAN and AYUSH not present because we use IS
NOT NULL operator.

Now we can use IS NULL operator and write a query.


SQL> SELECT *
FROM CUSTOMERS
WHERE SALARY IS NULL;
After execution this query would produce the following results-

6 NAMAN 25 NOIDA NULL

7 AYUSH 33 GWALIOR NULL

What is Dangling tuple problem?

In DBMS if there is a tuple that does not participate in a natural join we called it as dangling
tuple . It may gives indication consistency problem in the database.

Another definition of dangling problem tuple is that a tuple with a foreign key value
that not appear in the referenced relation is known as dangling tuple. In DBMS
Referential integrity constraints specify us exactly when dangling tuples indicate
problem.

Joining of R and S with Full Join

Examples of SQL JOIN operator


A possible database for a vet clinic could have one table for pets and one for the owners. Since an
owner could have multiple pets, the pets table will have an owner_id column that points to the owners
table.

ID NAME AGE OWNER_ID


1 Fido 7 1
2 Missy 3 1
3 Sissy 10 2
4 Copper 1 3
5 Hopper 2 NULL

ID NAME PHONE_NUMBER
1 Johnny 4567823
2 Olly 7486513
3 Ilenia 3481365
4 Luise 1685364

You could use simple query to get a table with the pet name and the owner name next to each other.
Let's do it with all the different JOIN operators.

SQL INNER JOIN example

Let's do it first using JOIN.

In this case you would SELECT the column name from the pets table (and rename it pet_name). Then
you would select the name column from the owners table, and rename it owner. That would look like
this: SELECT pets.name AS pet_name, owners.name AS owner.

You would use FROM to say that the columns are from the pets table, and JOIN to say that you want to
join it with the owners table, using this syntax: FROM pets JOIN owner.

And finally you would say that you want to join two rows together when the owner_id column in the
pets table is equal to the id column in the owner table with ON pets.owner_id = owners.id.

Here it is all together:

SELECT pets.name AS pet_name, owners.name AS owner


FROM pets
JOIN owners
ON pets.owner_id = owners.id;
You would get a table as below, where only the pets connected to an owner and the owners connected
to a pet are included.

PET_NAME OWNER
Fido Johnny
Missy Johnny
Sissy Olly
Copper Ilenia

Problem here is that Hopper pet is missing as it does not have owner

SQL LEFT JOIN example

Let's do the same query using LEFT JOIN so you can see the difference. The query is the same other
than adding the LEFT keyword.

SELECT pets.name AS pet_name, owners.name AS owner


FROM pets
LEFT JOIN owners
ON pets.owner_id = owners.id;
In this case the rows from the left table, pets, are all kept, and when there is data missing coming from
the owners table, it is filled with NULL.

PET_NAME OWNER
Fido Johnny
Missy Johnny
Sissy Olly
Copper Ilenia
Hopper NULL

Hopper pet is available without owner as it does not have such one
(It seems there is a pet that is not registered with an owner.)

SQL RIGHT JOIN example


If you do the same query using RIGHT JOIN you would get yet a different result.

SELECT pets.name AS pet_name, owners.name AS owner


FROM pets
RIGHT JOIN owners
ON pets.owner_id = owners.id;
In this case all the rows from the right table, owners, are kept, and if there is a missing value, it is filled
with NULL.

PET_NAME OWNER
Fido Johnny
Missy Johnny
Sissy Olly
Copper Ilenia
NULL Louise

In this case all the rows from the right of table, owners, are kept, and if there is a missing value, in left
side it is filled with NULL.

It seems there is an owner that does not have a pet registered.

SQL FULL JOIN example


You could do the same query again, using FULL JOIN.

SELECT pets.name AS pet_name, owners.name AS owner


FROM pets
FULL JOIN owners
ON pets.owner_id = owners.id;
The resulting table is again different – in this instance all rows from the two tables are kept.

PET_NAME OWNER
Fido Johnny
Missy Johnny
Sissy Olly
Copper Ilenia
Hopper NULL
NULL Louise

It seems that there is a pet without an owner and an owner without a pet in our database.
Dangling tuple:

Dangling tuple = a tuple in a relation that does not join with any tuple in the other relation
Example:

Fact: Dangling tuples can be omitted when we perform a join operation

A more efficient way to perform ⋈ in distributed database


More efficient join execution: Transfer only the non-dangling tuples to the "join site"
The Semi-join ⋉ operation

Definition of the Semi join (⋉) operation:


R ⋉ S = πR ( R ⋈ ( πY(S) ) )

where: Y = join attribute(s) in S


Example

Conclusion:

R ⋉ S = the set of non-dangling tuples in R

Equivalence :

Equivalence of Functional Dependencies

Two FDs F and G sets over schema R are equivalent if F+ = G+. It means that if every functional
dependency of F is in G+ and every functional dependence of G is in F+, then we would say that
the sets of functional dependencies F and G are equivalent.

Let FD1 and FD2 be two FD sets for a relation R.


If all FDs of FD1 can be derived from FDs present in FD2, we can say that FD2 ⊃ FD1.
If all FDs of FD2 can be derived from FDs present in FD1, we can say that FD1 ⊃ FD2.

If 1 and 2 both are true, FD1=FD2.


All these three cases can be shown using the Venn diagram:

Q.1 Let us take an example to show the relationship between two FD sets. A relation R(A,B,C,D)
having two FD sets FD1 = {A->B, B->C, AB->D} and FD2 = {A->B, B->C, A->C, A->D}

Step 1: Checking whether all FDs of FD1 are present in FD2

A->B in set FD1 is present in set FD2.


B->C in set FD1 is also present in set FD2.
AB->D is present in set FD1 but not directly in FD2 but we will check whether we can derive it or not.
For set FD2, (AB)+ = {A, B, C, D}. It means that AB can functionally determine A, B, C, and D. So
AB->D will also hold in set FD2.
As all FDs in set FD1 also hold in set FD2, FD2 ⊃ FD1 is true.

Step 2: Checking whether all FDs of FD2 are present in FD1

A->B in set FD2 is present in set FD1.


B->C in set FD2 is also present in set FD1.
A->C is present in FD2 but not directly in FD1 but we will check whether we can derive it or not. For
set FD1, (A)+ = {A, B, C, D}. It means that A can functionally determine A, B, C, and D. So A->C will
also hold in set FD1.
A->D is present in FD2 but not directly in FD1 but we will check whether we can derive it or not. For
set FD1, (A)+ = {A, B, C, D}. It means that A can functionally determine A, B, C, and D. So A->D will
also hold in set FD1.
As all FDs in set FD2 also hold in set FD1, FD1 ⊃ FD2 is true.

Step 3: As FD2 ⊃ FD1 and FD1 ⊃ FD2 both are true FD2 =FD1 is true. These two FD sets are
semantically equivalent.
Q.2 Let us take another example to show the relationship between two FD sets. A relation
R2(A,B,C,D) having two FD sets FD1 = {A->B, B->C,A->C} and FD2 = {A->B, B->C, A->D}

Step 1: Checking whether all FDs of FD1 are present in FD2

A->B in set FD1 is present in set FD2.


B->C in set FD1 is also present in set FD2.
A->C is present in FD1 but not directly in FD2 but we will check whether we can derive it or not. For
set FD2, (A)+ = {A, B, C, D}. It means that A can functionally determine A, B, C, and D. SO A->C
will also hold in set FD2.

As all FDs in set FD1 also hold in set FD2, FD2 ⊃ FD1 is true.

Step 2: Checking whether all FDs of FD2 are present in FD1

A->B in set FD2 is present in set FD1.,


B->C in set FD2 is also present in set FD1.
A->D is present in FD2 but not directly in FD1 but we will check whether we can derive it or not. For
set FD1, (A)+ = {A,B,C}. It means that A can’t functionally determine D.
So A->D will not hold in FD1.
As all FDs in set FD2 do not hold in set FD1, FD2 ⊄ FD1.

Step 3: In this case, FD2 ⊃ FD1 and FD2 ⊄ FD1, these two FD sets are not semantically
equivalent.

Minimal Cover

If we have a set of functional dependencies, we get the simplest and irreducible form of functional
dependencies after reducing these functional dependencies. This is called the Minimal Cover or
Irreducible Set (as we can’t reduce the set further). It is also called a Canonical Cover.

https://www.nielit.gov.in/gorakhpur/sites/default/files/Gorakhpur/
Alevel_1_DBMS_05May2020_AV.pdf

Let us understand the procedure to find the minimal cover by this example:

The Given Functional Dependencies are – A->B, C->B, D->ABC, AC-> D

We can find the minimal cover by following the 3 simple steps.

Step: 1 First split the all right side attributes of all FDs as single (functional dependencies).

A->B, C->B, D->A, D->B, D->C, AC->D

Step: 2 Now remove all redundant FDs.


[Redundant FD is if we derive one FD from another FD ]

Let, 's test the redundancy of A->B

(A→B), We have to find the closure of A ie. A+ using rest of the following FDs, If we have B in
the A+, then we can remove A→ B as it is redundant or we have to keep it

C->B, D->A, D->B, D->C, AC→D

A+ = {A} (A is only closure contains to A, simply we can derive A from A (reflexive property)
So, A->B is not redundant. We have to keep it

Let's test the redundancy of C->B ,

Similarly for C->B We have to find the Closure of C (C+) using rest of the FDs except C→ B

A->B, D->A, D->B, D->C, AC→D

C+= {C} since B is not in the closure set of C , C → B not redundant, we can take it up

Let, 's test the redundancy of D->A,

Similarly for D->A , We have to find the Closure of D (D+) using rest of the FDs except D→A

A->B, C->B, D->B, D->C, AC->D

D+={D,B,C}

D+= {D,B,C} since A is not in the closure set of D , D→A is not redundant, we can take it up

Let's test the redundancy of D->B,

Similarly for D->B , We have to find the Closure of D (D+) using rest of the FDs except D→B

A->B, C->B, D->A, D->C, AC→D

D+={ D,A,B,C}

since B is in the closure set of D , D→B is a redundant, we can remove it

Let, 's test the redundancy of D→C,

Similarly for D->C , We have to find the Closure of D (D+) using rest of the FDs except D→C
A->B, C->B, D->A, AC→D

D+={ D,A,B}

since C is in the closure of D , D→ C is not redundant , we can take it up

Step 3 : Finding Extraneous Attributes

Let, 's test the redundancy of AC→D,

Similarly for AC→D ,

Check if AC+ and C+ are same , it implies A is extraneous and we can remove it
Check if AC+ and A+ are same , it implies C is extraneous and we can remove it

AC+= { ACB}
A+= {AB}
C+={CB}

AC+ !=C+ and AC+ != A+ , Therefore AC→ D is not redundant and can be taken

Result :

So, the minimal FDs set is : A→ B, C→ B, D→A,D→ C,AC→ D

Hence, we can write it as A→ B, C->B, D->AC, AC->D this is the minimum cover.

About Nulls, Dangling Tuples, and Alternative Relational


Designs
Problems with NULL Values and Dangling Tuples

We must carefully consider the problems associated with NULLs when designing a relational database
schema. There is no fully satisfactory relational design theory as yet that includes NULL values.
One problem occurs when some tuples have NULL values for attributes that will be used to join
individual relations in the decomposition.

To illustrate this, consider the database shown in Figure 16.2(a), where two relations EMPLOYEE and
DEPARTMENT are shown. The last two employee tuples— ‘Berger’ and ‘Benitez’—represent
newly hired employees who have not yet been assigned to a department (assume that this does not
violate any integrity constraints). Now suppose that we want to retrieve a list of (Ename, Dname)
values for all the employees. If we apply the NATURAL JOIN operation on EMPLOYEE and
DEPARTMENT (Figure 16.2(b)), the two aforementioned tuples will not appear in the result. The
OUTER JOIN operation, can deal with this problem. Recall that if we take the LEFT OUTER JOIN
of EMPLOYEE with DEPARTMENT, tuples in EMPLOYEE that have NULL for the join attribute will
still appear in the result, joined with an imaginary tuple in DEPARTMENT that has NULLs for all its
attribute values. Figure 16.2(c) shows the result.

By applying natural join, we will get the following table


We will get a dangling tuple problem , if we continue to apply the left outer Join with aforementioned
tables as follows

Alternate solution to this dangling tuple problem is as follows

Decomposing the Employee table without Dnum


When Employee_1 is used along with Employee_2 , it generates the dangling tuples since there is no
information of Dnum,Dname & DmgrSSN for Berger and Benitez as shown in Fig 16.2 c

It can be rectified by alternate method , when Employee_1 is combined with Employee_3 , it does
not generate dangling tuples by natural join as shown in fig below
Further discussion of Multivalued dependencies and 4NF

Multivalued dependency and Fourth Normal form in DBMS

A multivalued dependency prevents fourth normal form. A multivalued dependency involves at


least three attributes of a table.

It is represented with a symbol "->->" in DBMS.

X->Y relates one value of X to one value of Y.

X->->Y (read as X multidetermines Y) relates one value of X to many values of Y.

A Nontrivial MVD occurs when X->->Y and X->->Z where Y and Z are not dependent are
independent to each other. Non-trivial MVD produces redundancy.

For example, consider a table called "Students". It has columns: "Student ID," "Course," and
"Textbook." Each student can take multiple courses, and each course may require multiple textbooks.
Therefore, the "Course" and "Textbook" columns are multivalued attributes.

In this example, we can note that there is a relationship between the "Course" and "Textbook" columns.
The "Course" column determines which textbooks are needed. For example, a student taking "Math"
will need both "Algebra" and "Calculus" textbooks. This relationship between "Course" and
"Textbook" is a multivalued dependency.

To express this MVD, we can write the following formula:

Course →→ Textbook
This indicates that for any given value of "Course," there is a set of corresponding values of
"Textbook." For example, if we know that a student is taking "Math," we can infer that the student
needs both "Algebra" and "Calculus" textbooks.

MVD occurs when a table has a non-trivial relationship between attributes that are not part of the
same composite key. In the example above, the "Course" and "Textbook" columns have a multivalued
dependency because the "Course" column determines which textbooks are needed.

Fourth Normal Form (4NF)

Fourth Normal Form (4NF) is a level of database normalization that requires a relation to be in BCNF
and have no non-trivial multivalued dependencies other than the candidate key, to eliminate redundant
data and maintain data consistency. If a table violates this standard, it needs to be split into two tables
to achieve 4NF.

For a relation R to be in 4NF, it must meet two conditions −

It should be in Boyce-Codd Normal Form (BCNF).

It should not have any non-trivial multivalued dependencies.

To remove the multivalued dependency (MVD) in the "Students" table example, we can create two new
tables, one for "Courses" and another for "Textbooks," and establish a relationship between them using
foreign keys.

Here's how we can create the tables:

Table 1: Students

Table 2: Courses
Table 3: Textbooks

So, we removed the multivalued dependency by splitting the "Course" and "Textbook" columns into
separate tables.

We have also added a new "Course ID" column to the "Students" table. It has a foreign key that
references the "Course ID" column in the "Courses" table. Similarly, the "Textbooks" table also has a
"Course ID" column that serves as a foreign key referencing the "Course ID" column in the "Courses"
table.

Hence, we have achieved the fourth normal form (4NF) for the "Students" table. It has done after by
removing the multivalued dependency and creating separate tables. The Resultant schema eliminates
data redundancy and improves data integrity, making it easier to manage and query the database.

You might also like