DBMS Module IV NOTES
DBMS Module IV NOTES
Normalization Algorithms: Inference Rules, Equivalence, and Minimal Cover, Properties of Relational
Decompositions, Algorithms for Relational Database Schema Design, Nulls, Dangling tuples, and
alternate Relational Designs, Further discussion of Multivalued dependencies and 4NF, Other
dependencies and Normal Forms
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~`
A functional dependency occurs when the value of one attribute determines the value of another
attribute.
Normalization is the process of organizing a database in a way that reduces redundancy and
dependency. It is a crucial step in designing an efficient and effective database structure.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Functional dependencies in Database Management Systems (DBMS) are a set of constraints or rules
that define the relationships between attributes (columns) within a relational database table.
These dependencies specify how the values of one or more attributes uniquely determine the
values of other attributes.
For example, consider a database of employee records. The employee's ID number might be
functionally dependent on their name because the name determines the ID number. In this case, we
would say that the ID number is functionally dependent on the name.
Functional dependencies can be used to design a database in a way that eliminates redundancy and
ensures data integrity. For example, consider a database that stores employee records and the
departments they work in. If we store the department name for each employee, we might end up with
several copies of the same department name.
This would be redundant and would take up unnecessary space in the database. Instead, we can
use functional dependencies to store the department name only once and use the employee's ID
number to determine which department they work in. This reduces redundancy and makes the
database more efficient.
Without Functional dependencies ( which occupies more spaces) Equivalence, and Minimal Cove
Depart_id Department_name
1 Computer Science & Engineering
2 Information Science & Engineering
Real life example of functional dependency, multivalued dependency, trivial functional dependency and
non-trivial functional dependency.
Functional Dependency is when one attribute governs other attribute in a DBMS system. Functional
Dependency plays a vital role to make big difference between good and bad database design.
In above example, if we know the Employee number, we can find Employee Name, city, salary, etc.
With this fact, we can say that the city, Employee Name, and salary are functionally depended on
Employee number.
Multivalued dependency is the situation where there are multiple independent multivalued attributes
occurred in a single table. A multivalued dependency is a constraint between two sets of attributes in a
relation. It requires that certain tuples be present in a relation.
In above example, color and maf_year are independent of each other but dependent on car_model. In
this example, these two columns (color and maf_year) are said to be multi value dependent on
car_model.
The Trivial dependency is a set of attributes which are included in that attribute.
Functional dependency which also known as a nontrivial dependency occurs when A→B holds true
where B is not a subset of A. In a relationship, if attribute B is not a subset of attribute A, then it is
considered as a non-trivial dependency.
Example:
Here CEO is not a subset of Company, and Thence it's non-trivial functional dependency.
Types of Functional dependencies in DBMS
In a relational database management, functional dependency is a concept that specifies the relationship
between two sets of attributes where one attribute determines the value of another attribute. It is
denoted as X → Y, where the attribute set on the left side of the arrow, X is called Determinant, and Y
is called the Dependent.
Functional dependencies are used to mathematically express relations among database entities and
are very important to understand advanced concepts in Relational Database System and understanding
problems in competitive exams like Gate.
Example:
From the above table we can conclude some valid functional dependencies:
roll_no → { name, dept_name, dept_building },→ Here, roll_no can determine values of fields
name, dept_name and dept_building, hence a valid Functional dependency
roll_no → dept_name , Since, roll_no can determine whole set of {name, dept_name, dept_building},
it can determine its subset dept_name also.
dept_name → dept_building , Dept_name can identify the dept_building accurately, since
departments with different dept_name will also have a different dept_building
More valid functional dependencies:
roll_no → name, {roll_no, name} ⇢ {dept_name, dept_building}, etc.
name → dept_name Students with the same name can have different dept_name, hence this is not
a valid functional dependency.
Transitivity: If X → Y and Y → Z are both valid dependencies, then X→Z is also valid by the
Transitivity rule.
In Trivial Functional Dependency, a dependent is always a subset of the determinant. i.e. If X → Y and
Y is the subset of X, then it is called trivial functional dependency
Example:
Here, {roll_no, name} → name is a trivial functional dependency, since the dependent name is a
subset of determinant set {roll_no, name}. Similarly, roll_no → roll_no is also an example of trivial
functional dependency.
2. Non-trivial Functional Dependency
In Non-trivial functional dependency, the dependent is strictly not a subset of the determinant.
i.e. If X → Y and Y is not a subset of X, then it is called Non-trivial functional dependency.
Example:
Here, roll_no → name is a non-trivial functional dependency, since the dependent name is not a
subset of determinant roll_no. Similarly, {roll_no, name} → age is also a non-trivial functional
dependency, since age is not a subset of {roll_no, name}
In Multivalued functional dependency, entities of the dependent set are not dependent on each
other. i.e. If a → {b, c} and there exists no functional dependency between b and c, then it is called a
multivalued functional dependency.
For example,
Here, roll_no → {name, age} is a multivalued functional dependency, since the dependents name &
age are not dependent on each other(i.e. name → age or age → name doesn’t exist !)
4. Transitive Functional Dependency
For example,
Here, enrol_no → dept and dept → building_no. Hence, according to the axiom of transitivity,
enrol_no → building_no is a valid functional dependency. This is an indirect functional dependency,
hence called Transitive functional dependency.
In partial functional dependency a non key attribute depends on a part of the composite key, rather than
the whole key. If a relation R has attributes X, Y, Z where X and Y are the composite key and Z is
non key attribute. Then X->Z is a partial functional dependency in RBDMS.
For example , from the above table , if there is an attribute course_no , then enrol_no - >
course_no is a partial functional dependancy
Advantages of Functional Dependencies
Functional dependencies having numerous applications in the field of database management system.
Here are some applications listed below:
1. Data Normalization
Data normalization is the process of organizing data in a database in order to minimize
redundancy and increase data integrity. Functional dependencies play an important part in data
normalization. With the help of functional dependencies we are able to identify the primary key,
candidate key in a table which in turns helps in normalization.
2. Query Optimization
With the help of functional dependencies we are able to decide the connectivity between the tables
and the necessary attributes need to be projected to retrieve the required data from the tables.
This helps in query optimization and improves performance.
3. Consistency of Data
Functional dependencies ensures the consistency of the data by removing any redundancies or
inconsistencies that may exist in the data. Functional dependency ensures that the changes made
in one attribute does not affect inconsistency in another set of attributes thus it maintains the
consistency of the data in database.
Functional dependencies ensure that the data in the database to be accurate, complete and
updated. This helps to improve the overall quality of the data, as well as it eliminates errors and
inaccuracies that might occur during data analysis and decision making, thus functional dependency
helps in improving the quality of data in database.
Conclusion
Functional dependency is very important concept in database management system for ensuring the
data consistency and accuracy.
What is Normalization?
The main reason for normalizing the relations is removing these anomalies. Failure to eliminate
anomalies leads to data redundancy and can cause lack of data integrity and other problems as the
database grows. Normalization consists of a series of guidelines that helps to guide you in creating a
good database structure.
Insertion Anomaly: Insertion Anomaly refers to when one cannot insert a new tuple into a relationship
due to lack of data.
Deletion Anomaly: The delete anomaly refers to the situation where the deletion of data results in the
unintended loss of some other important data.
Updatation Anomaly: The update anomaly is when an update of a single data value requires multiple
rows of data to be updated.
For example, consider a database that stores customer information and the products they have
purchased. If we store the product names with each customer record, we might end up with
several copies of the same product name. This would be redundant and would take up
unnecessary space in the database. Instead, we can use normalization to create a separate table
for products and store the product names only once. This reduces redundancy and makes the
database more efficient.
There are several normal forms that can be used to normalize a database. The most common normal
forms are the first, second, and third normal forms.
The first normal form (1NF) is a basic level of normalization. To be in 1NF, a table must meet the
following criteria −
It must contain only atomic values. An atomic value is a single value that cannot be further broken
down. For example, a name is an atomic value, but an address is not because it can be broken down
into separate values for the street, city, state, and zip code.
It must not contain repeating groups. A repeating group is a set of values that are repeated within a
single record. For example, if a table contains a field for phone numbers, it should not contain
multiple phone numbers within the same field. Instead, there should be separate fields for each
phone number.
The second normal form (2NF) is a higher level of normalization. To be in 2NF, a table must meet the
following criteria −
It must be in 1NF.
It must not have any partial dependencies. A partial dependency occurs when a non-key attribute is
dependent on only a part of the primary key. For example, consider a table with the following
attributes: EmployeeID (primary key), EmployeeName, and DepartmentID.
If the DepartmentID is dependent on the EmployeeID, but not on the EmployeeName, there is a
partial dependency. To eliminate this dependency, we would create a separate table for departments
and store the DepartmentID and DepartmentName in that table.
The third normal form (3NF) is a higher level of normalization. To be in 3NF, a table must meet the
following criteria:
It must be in 2NF.
It must not have any transitive dependencies. A transitive dependency occurs when an attribute is
dependent on another attribute that is not the primary key. For example, consider a table with the
following attributes: EmployeeID (primary key), EmployeeName, and ManagerID. If the ManagerID is
dependent on the EmployeeID, which is the primary key, there is no transitive dependency. However, if
the ManagerID is dependent on the EmployeeName, which is not the primary key, there is a
transitive dependency. To eliminate this dependency, we would create a separate table for managers and
store the ManagerID and ManagerName in that table.
Types of Normal Forms:
Normalization works through a series of stages called Normal forms. The normal forms apply to
individual relations. The relation is said to be in particular normal form if it satisfies constraints.
Following are the various types of Normal forms with its decomposition and Conditions
Advantages of Normalization
The main purpose of database normalization is to avoid complexities, eliminate duplicates, and
organize data in a consistent way. In normalization, the data is divided into several tables linked
together with relationships.
Database administrators are able to achieve these relationships by using primary keys, foreign keys,
and composite keys.
A primary key is a column that uniquely identifies the rows of data in that table. It’s a unique
identifier such as an employee ID, student ID, voter’s identification number (VIN), and so on.
A foreign key is a field that relates to the primary key in another table.
A composite key is just like a primary key, but instead of having a column, it has multiple
columns.
1NF, 2NF, and 3NF are the first three types of database normalization. They stand for first normal
form, second normal form, and third normal form, respectively.
There are also 4NF (fourth normal form) and 5NF (fifth normal form). There’s even 6NF (sixth
normal form), but the commonest normal form you’ll see out there is 3NF (third normal form).
All the types of database normalization are cumulative – meaning each one builds on top of those
beneath it. So all the concepts in 1NF also carry over to 2NF, and so on.
For a table to be in the first normal form, it must meet the following criteria:
A single cell must not hold more than one value (atomicity)
An each column must have only one value for each row in the table
There must be a primary key for identification
No duplicated rows or columns
Types Of Functional Dependencies-
Examples-
AB → A
AB → B
AB → AB
Examples-
AB → BC
AB → CD
Partial Functional Dependency
In partial functional dependency a non key attribute depends on a part of the composite key, rather
than the whole key. If a relation R has attributes X, Y, Z where X and Y are the composite key
(X,Y) and Z is non key attribute. Then X->Z is a partial functional dependency in RBDMS as Y
is NOT determinant of the Z
When a table is in 2NF, it eliminates repeating groups and redundancy, but it does not eliminate
transitive partial dependency.
This means a non-prime attribute (an attribute that is not part of the candidate’s key) is
dependent on another non-prime attribute. This is what the third normal form (3NF) eliminates.
It must be in 2NF
It has no transitive partial dependency.
Database normalization is quite technical, but we will illustrate each of the normal forms with
examples.
Imagine we're building a restaurant management application. That application needs to store data
about the company's employees and it starts out by creating the following table of employees:
All the entries are atomic and there is no repeating groups. so the table is in the first normal form
(1NF)
In the above table, there are two determinants Employee_ID and Job_code , both form a composite
key here. ( if you know employee_ID then name,state_code,home_state will be determined and if
you know Job_code then Job will be determined) and there is a composite primary key
(employee_id, job_code) for this table.
In partial functional dependency a non key attribute depends on a part of the composite key, rather
than the whole key. If a relation R has attributes X, Y, Z where X and Y are the composite key
(X,Y) and Z is non key attribute. Then X->Z is a partial functional dependency in RBDMS as Y
is NOT determinant of the Z
Similarly, here composite primary key (employee_id, job_code) - > name or state_code,
home_state are only partial dependent on the primary key employee_id but not on job_code.
So, the table is not in 2NF. We should decompose them to a different table to make it 2NF.
Example of Second Normal Form (2NF)
employee_roles Table
employees Table
jobs table
Now it is in 2 NF as there is no partial dependency.
For further improvement in normalization, if you carefully look into employee table,
You can observe that Employee_ID determines the State_code and State_code determines the
home_state which is obviously a transitive dependency.
i.e
(Employee_ID - > State_code ) and (State_code - > home_state )
To take this a step further, we should decompose the table again to a different table to make it 3NF.
employee table
states table
employee_roles table
jobs table
Boyce-Codd Normal Form or BCNF is an extension to the third normal form, and is also known as
3.5 Normal Form. It is stricter than 3 NF ( which must not be transitive dependency)
Therefore, For BCNF, the table should be in 3NF, and for every FD, LHS is super key.
Example
Below we have a college enrolment table with columns student_id, subject and professor.
As you can see, we have also added some sample data to the table.
One student can enroll for multiple subjects. For example, student with student_id 101, has opted for
subjects - Java & C++
And, there can be multiple professors teaching one subject like we have for Java.
Well, in the table above (student_id, subject ) together form the primary key, because using
student_id and subject, we can find all the columns of the table.
This table satisfies the 1st Normal form because all the values are atomic, column names are unique
and all the values stored in a particular column are of same domain.
This table also satisfies the 2nd Normal Form as their is no Partial Dependency.
And, there is no Transitive Dependency, hence the table also satisfies the 3rd Normal Form.
In the table above, (student_id, subject ) form primary key, which means subject column is a prime
attribute.
And while subject is a prime attribute, professor is a non-prime attribute, which is not allowed by
BCNF.
To make this relation(table) satisfy BCNF, we will decompose this table into two tables, student table
and professor table.
Student Table
And now, this relation satisfy Boyce-Codd Normal Form. Next we will learn about the Fourth
Normal Form.
Fourth normal form (4NF)
A relation will be in 4NF if it is in Boyce Codd normal form and has no multi-valued dependency.
For a dependency A → B, if for a single value of A, multiple values of B exists, then the relation
will be a multi-valued dependency.
Example
STUDENT
The given STUDENT table is in 3NF, but the COURSE and HOBBY are two independent entity.
Hence, there is no relationship between COURSE and HOBBY.( I.e no transitive dependency)
In the STUDENT relation, a student with STU_ID, 21 contains two courses, Computer and Math and
two hobbies, Dancing and Singing. So there is a Multi-valued dependency on STU_ID, which leads
to unnecessary repetition of data.
So to make the above table into 4NF, we can decompose it into two tables:
STUDENT_COURSE
STUDENT_HOBBY
A relation is in 5NF if it is in 4NF and not contains any join dependency and joining should be
lossless.
5NF is satisfied when all the tables are broken into as many tables as possible in order to avoid
redundancy.
5NF is also known as Project-join normal form (PJ/NF).
Join Dependency
The relation X would satisfy join dependency whenever X is equal to the join of X1, X2, ….. Xn,
where Xi happens to be a subset of a set of attributes of X.
Relation X
Thus, the relation given above says that sec offers many elective languages that are taken by a
combination of their students. These students have their individual opinion to choose their
languages. Thus, all three fields are required to represent this data and information.
This relation does not display non-trivial MVDs. It is because the attributes, language and name, are
dependent. Thus, these are related to one another (A Functional Dependency subject -> the existing
name). This relation cannot be decomposed into two relations (sec, language) and (sec, name).
X1(sec, language)
X3(language, name)
X1
X2
X3
Relational Decomposition
When a relation in the relational model is not in appropriate normal form then the decomposition of a
relation is required.
If the relation has no proper decomposition, then it may lead to problems like loss of information.
Decomposition is used to eliminate some of the problems of bad design like anomalies, inconsistencies,
and redundancy.
Properties of Decomposition
Lossless Decomposition
If the information is not lost from the relation that is decomposed, then the decomposition will
be lossless.
The lossless decomposition guarantees that the join of relations will result in the same relation
as it was decomposed.
The relation is said to be lossless decomposition if natural joins of all the decomposition give
the original relation.
Example:
EMPLOYEE_DEPARTMENT table:
The above relation is decomposed into two relations EMPLOYEE and DEPARTMENT
EMPLOYEE table:
33 Alina 25 Delhi
46 Stephan 30 Bangalore
52 Katherine 36 Mumbai
60 Jack 40 Noida
DEPARTMENT table
DEPT_ID EMP_ID DEPT_NAME
827 22 Sales
438 33 Marketing
869 46 Finance
575 52 Production
678 60 Testing
Now, when these two relations are joined on the common column "EMP_ID", then the resultant
relation will look like:
Employee ⋈ Department
EMP_ID EMP_NAME EMP_AGE EMP_CITY DEPT_ID DEPT_NAME
22 Denim 28 Mumbai 827 Sales
Dependency Preserving
It is an important constraint of the database.
If a relation R is decomposed into relation R1 and R2, then the dependencies of R either must
be a part of R1 or R2 or must be derivable from the combination of functional dependencies of
R1 and R2.
Check the following Relation R(A,B,C,D) with {A→ B, C→ D} AND Sub-relations R1(A,B) and
R2(C,D) are lossless and dependency preserving deposition
(A,B)
A+ = AB , A→ B
B+= B nothing
(C,D)
C+ = CD , C→ D
D+= D , nothing
(F1 U F2 )+ = { A→ B, C→ D}
In the reflexive rule, if Y is a subset of X, then X determines Y.( Vice versa is also true)
If X ⊇ Y then X → Y
Example:
X = {a, b, c, d, e}
Y = {a, b, c}
2. Augmentation Rule (IR2)
If X → Y then XZ → YZ
Example:
In the transitive rule, if X determines Y and Y determine Z, then X must also determine Z.
If X → Y and Y → Z then X → Z
Union rule says, if X determines Y and X determines Z, then X must also determine Y and Z.
If X → Y and X → Z then X → YZ
Proof:
1. X → Y (given)
2. X → Z (given)
3. X → XY (using IR2 on 1 by augmentation with X. Where XX = X)
4. XY → YZ (using IR2 on 2 by augmentation with Y)
5. X → YZ (using IR3 on 3 and 4)
This Rule says, if X determines Y and Z, then X determines Y and X determines Z separately.
If X → YZ then X → Y and X → Z
Proof:
1. X → YZ (given)
2. YZ → Y (using IR1 Rule, according to subset rule)
3. X → Y (using IR3 on 1 and 2)
If X → Y and YZ → W then XZ → W
Proof:
1. X → Y (given)
2. WY → Z (given)
3. WX → WY (using IR2 on 1 by augmenting with W)
4. WX → Z (using IR3 on 3 and 2)
Four informal guidelines that may be used as measures to determine the quality of relation schema
design:
Making sure that the semantics of the attributes is clear in the schema
Reducing the redundant information in tuples
Reducing the NULL values in tuples
Disallowing the possibility of generating spurious tuples
Examples of Violating Guideline 1. The relation schemas in Figures (a) and (b) also have clear
semantics.
They violate Guideline 1 by mixing attributes from distinct real-world entities: EMP_DEPT mixes
attributes of employees and departments, and EMP_PROJ mixes attributes of employees and
projects and the WORKS_ON relationship. Hence, they fare poorly against the above measure of
design quality.
Guideline 2: Design the base relation schemas so that no insertion, deletion, or modification
anomalies are present in the relations.
If any anomalies are present, note them clearly and make sure that the programs that update the
database will operate correctly.
Figure below shows a simplified version of the COMPANY relational database schema
It is difficult to insert a new department that has no employees as yet in the EMP_DEPT relation.
The only way to do this is to place NULL values in the attributes for employee.
Insertion Anomalies: Insertion anomalies can be differentiated into two types, illustrated by the
following examples based on the EMP_DEPT relation:
To insert a new employee tuple into EMP_DEPT, we must include either the attribute values for the
department that the employee works for, or NULLs (if the employee does not work for a department as
yet). For example, to insert a new tuple for an employee who works in department number 5, we must
enter all the attribute values of department 5 correctly so that they are consistent with the corresponding
values for department 5 in other tuples in EMP_DEPT.
It is difficult to insert a new department that has no employees as yet in the EMP_DEPT relation.
The only way to do this is to place NULL values in the relation
Deletion Anomalies: The problem of deletion anomalies is related to the second insertion anomaly
situation just discussed. If we delete from EMP_DEPT an employee tuple that happens to
represent the last employee working for a particular department, the information concerning
that department is lost from the database.
Guide line 3 : Avoid placing attributes in a base relation whose values may frequently be NULL.
If NULLs are unavoidable, make sure that they apply in exceptional cases only and do not apply to
a majority of tuples in the relation.
Guideline 4 : Design relation schemas so that they can be joined with equality conditions on
attributes
Design relation schemas so that they can be joined with equality conditions on attributes that are
appropriately related (primary key, foreign key) pairs in a way that guarantees that no spurious
tuples are generated. Avoid relations that contain matching attributes that are not (foreign key,
primary key) combinations because joining on such attributes may produce spurious tuples.
Example of Spurious tuples :
Spurious Tuples: Spurious Tuples are those rows in a table, which occur as a result of joining two
tables in the wrong manner. They are extra tuples (rows) that might not be required.
If a relation is denoted by R, and its decomposed relations are denoted by R1, R2, R3…Rn, then, the
condition for not getting any Spurious Tuple is denoted by,
R1 ⨝ R2 ⨝ R3 .... ⨝ Rn = R
R ⊂ R1 ⨝ R2 ⨝ R3 .... ⨝ Rn
Example-1: Example to check if the given relation contains Spurious Tuples. Let R be a Relation, and
R1 and R2 be relations that we get after decomposing R.
After performing the join operation of relations R1 and R2 (R1 ⨝ R2), we get back the original relation
R.
Example - 1 : Joining two tables which causes no spurious tuples
The condition for no spurious tuples, R1 ⨝ R2 = R, is met. Hence, we do not get any Spurious Tuples.
Example-2: Example to check if the given relation contains Spurious Tuples. Let R be a Relation,
and R1 and R2 be relations that we get after decomposing R.
After performing the join operation of relations R1 and R2 (R1 ⨝ R2), we do not get back the
original relation R.
The condition for spurious tuples, R ⊂ R1 ⨝ R2, is met. Hence, we get Spurious Tuples. Conclusion –
Spurious Tuples exist. Note – Rows in DBMS are called tuples. Whereas columns in DBMS are called
attributes. Spurious Tuples can be remembered as extra rows in the table. The natural join leading to
Spurious Tuples is called Lossy Join. The natural join not resulting in Spurious Tuples is called
Lossless Join.
Lossless join decomposition is a decomposition of a relation R into relations R1, and R2 such that
if we perform a natural join of relation R1 and R2, it will return the original relation R. This is
effective in removing redundancy from databases while preserving the original data.
In other words by lossless decomposition, it becomes feasible to reconstruct the relation R from
decomposed tables R1 and R2 by using Joins.
Only 1NF,2NF,3NF, and BCNF are valid for lossless join decomposition.
In Lossless Decomposition, we select the common attribute and the criteria for selecting a common
attribute is that the common attribute must be a candidate key or super key in either relation R1, R2, or
both.
Equivalence, and
Minimal Cove
Functional dependency using closure
Functional Dependency In DBMS : Introduction
Functional Dependency in DBMS also known as “FDs” means a relationship. A relationship which
only exists when an attribute can determine other attribute functionally.
The first attribute does not compute or calculate the value of second attribute but searches value of the
tuple corresponding to the first attribute and fetches that value of the attribute.
Functional Dependency in DBMS is denoted using an arrow between two or more attributes such as :
FD : A -> B
Here, A & B are the attributes present in any relation.
“A->B” means, “B” is functionally dependent upon “A” or “A” functionally determines “B”.
Functional dependency acts as a constraint between set of attributes present in any database.
Functional Dependency In DBMS : Examples
Functional Dependency in DBMS and Keys are the most important concepts that are used as a
foundation in database normalization. We will try to explain you with example how actually
functional dependency works.
Example-1 : Consider a table student_details containing details of some students.
We can conclude from Roll_No attribute in the table, we are able to determine the Name of student
uniquely and same is the case with marks too. Hence, we can say that Name and Marks are
functionally dependent on Roll_No but the vice versa is not true.
Example : student_details Table ( This is wrong to have two records having same rollno )
Here, Name is not functionally dependent upon Roll_No as whenever we will try to search the value
of Name against Roll_No attribute, two different names will be provided which is practically not
possible.
Hence, Functional Dependency in DBMS concept exists when an attribute is able to uniquely
determine another attribute.
Functional Dependency In DBMS : Armstrong’s Axioms
Axioms in database management systems was introduced by William W. Armstrong in late 90’s and
these axioms play a vital role while implementing the concept of functional dependency in DBMS for
database normalization. There exists six inferences known a s “Armstrong’s Axioms” which are
discussed below.
Reflexive : It means, if set “B” is a subset of “A”, then A -> B (IR1).
Augmentation : It means, if A -> B, then AC -> BC(IR2).
Transitive : It means, if A -> B & B-> C, then A-> C(IR3).
Union : It means, if A->B & A->C, then A->BCIR4).
Decomposition : It means, if A->BC, then A->B & A->CIR5).
Pseudo-Transitivity : It means, if A->B and DB->C, then DA→C (IR6).
Attribute Closure
Closure Of Functional Dependency : Introduction
The Closure Of Functional Dependency means the complete set of all possible attributes that can
be functionally derived from given functional dependency using the inference rules known as
Armstrong’s Rules.
There are three steps to calculate closure of functional dependency. These are:
Step-1 : Add the attributes which are present on Left Hand Side in the original functional
dependency.
Step-2 : Now, add the attributes present on the Right Hand Side of the functional dependency.
Step-3 : With the help of attributes present on Right Hand Side, check the other attributes that
can be derived from the other given functional dependencies. Repeat this process until all the
possible attributes which can be derived are added in the closure.
Example-1 : Consider the table student_details having (Roll_No, Name,Marks, Location) as the
attributes and having two functional dependencies.
Now, We will calculate the closure of all the attributes present in the relation using the three steps
mentioned below.
Step-1 : Add attributes present on the LHS of the first functional dependency to the closure.
{Roll_no}+ = {Roll_No}
Step-2 : Add attributes present on the RHS of the original functional dependency to the
closure( using Union operation)
Step-3 : Add the other possible attributes which can be derived using attributes present on the
RHS of the closure.
Therefore, complete closure of Roll_No will be of all attributes which are in RHS.
Step-1 : Add attributes present on the LHS of the functional dependency to the closure.
{Name}+ = {Name}
Step-2 : Add the attributes present on the RHS of the functional dependency to the closure.
Step-3 : Since, we don’t have any functional dependency where “Marks or Location” attribute is
functionally determining any other attribute, we cannot add more attributes to the closure.
Hence complete closure of Name would be :
NOTE : We don’t have any Functional dependency where marks and location can functionally
determine any attribute. Hence, for those attributes we can only add the attributes themselves in their
closures. Therefore,
{Marks}+ = {Marks}
and
{Location}+ = { Location}
FD1 : A-> B C
FD2 : C -> B
FD3 : D -> E
FD4 : E-> D
Now, we need to calculate the closure of attributes of the relation R. The closures will be:
{A}+ = {A, B, C}
{B}+ = {B}
{C}+ = {B, C}
{D}+ = {D, E}
{E}+ = {E,D}
“A Candidate Key of a relation is an attribute or set of attributes that can determine the whole
relation or contains all the attributes in its closure."
FD1 : A-> B
FD2 : B-> C
{A}+ = {A, B, C}
{B}+ = {B, C}
{C}+ = {C}
Clearly, “A” is the candidate key as, its closure contains all the attributes present in the relation
“R”.
FD1 : A-> BC
FD2 : C-> B
FD3 : D-> E
FD4 : E-> D
{A}+ = {A, B, C}
{B}+ = {B}
{C}+ = {C, B}
{D}+ = {E, D}
{E}+ = {E, D}
In this case, a single attribute is unable to determine all the attribute on its own like in previous
example. Here, we need to combine two or more attributes to determine the candidate keys.
Hence, "AD" and "AE" are the two possible keys of the given relation “R”. Any other combination
other than these two would have acted as extraneous attributes.
NOTE : Any relation “R” can have either single or multiple candidate keys.
Key Definitions
Prime Attributes : Attributes which are indispensable part of candidate keys. For example : “A, D,
E” attributes are prime attributes in above example-2.
Non-Prime Attributes : Attributes other than prime attributes which does not take part in
formation of candidate keys.
For example.
Extraneous Attributes : Attributes which does not make any effect on removal from candidate key.
FD1 : A-> BC
FD2 : B-> C
FD3 : D-> C
Closures of LHS
{A}+={A,B,C}
{B}+={B,C}
{D}+={D,C}
Prime Attributes : A, D.
Non-Prime Attributes : B, C
Extraneous Attributes : B, C (As if we add any of the to the candidate key, it will remain unaffected).
Those attributes, which if removed does not affect closure of that set.
Another example:
Suppose a relational schema R(w x y z), and set of functional dependency as follows
F : { wx->yz,
y->w,
z->x }
Solution:
{w}+ = {w}
{x} + = {x}
{y} + = {y w}
{z} + = {z x}
Since we don't have all attributes of the relation, Let us try all other combinations
{wx} + = {w x y z}
{wy} + = {w y}
{wz} + = {w z x y}
{xy} + = {x y z w}
{xz}+ = {x z}
{yz }+ = {y z w x}
Another example
Suppose a relational schema R(a, b, c, d, e), and set of functional dependency as follows
F : { ab ->cd,
d ->a,
bc->de }
Therefore ab,bc & bd are candidates as they contain all attributes of the relation
Equivalence, and Minimal Cover, Equivalence, and Minimal Cover, Properties of Relational
Decompositions, Algorithms for Relational Database Schema Design, Nulls, Dangling tuples, and
alternate Relational Designs, Further discussion of Multivalued dependencies and 4NF, Other
dependencies and Normal Forms
When a relation in the relational model is not appropriate normal form then the decomposition of a
relation is required. In a database, breaking down the table into multiple tables termed as
decomposition. The properties of a relational decomposition are listed below :
Attribute Preservation:
Using functional dependencies the algorithms decompose the universal relation schema R in a set
of relation schemas D = { R1, R2, ….. Rn } relational database schema, where ‘D’ is called the
Decomposition of R.
The attributes in R will appear in at least one relation schema Ri in the decomposition, i.e., no
attribute is lost. This is called the Attribute Preservation condition of decomposition.
Dependency Preservation:
If each functional dependency X->Y specified in F appears directly in one of the relation schemas
Ri in the decomposition D or could be inferred from the dependencies that appear in some Ri.
This is the Dependency Preservation.
For example:
R = (A, B, C)
F = {A ->B, B->C}
Key = {A}
No redundancy:
Decomposition is used to eliminate some of the problems of bad design like anomalies,
inconsistencies, and redundancy.If the relation has no proper decomposition, then it may lead to
problems like loss of information.
Lossless Join:
For example:
R : relation, F : set of functional dependencies on R,
X, Y : decomposition of R,
A decomposition {R1, R2, …, Rn} of a relation R is called a lossless decomposition for R if the natural
join of R1, R2, …, Rn produces exactly the relation R.
Lossless Decomposition
Lossy Decomposition
Lossless Vs Lossy
Lossless Lossy
The decompositions R1, R2, R2…Rn for a
The decompositions R1, R2, R2…Rn for a
relation schema R are said to be Lossy if there
relation schema R are said to be Lossless if there
natural join results into addition of extraneous
natural join results the original relation R.
tuples with the original relation R.
Formally, Let R be a relation and R1, R2, R3 … Formally, Let R be a relation and R1, R2, R3 …
Rn be it’s decomposition, the decomposition is Rn be its decomposition, the decomposition is
lossless if – lossy if –
R1 ⨝ R2 ⨝ R3 .... ⨝ Rn = R R ⊂ R1 ⨝ R2 ⨝ R3 .... ⨝ Rn
There is no loss of information as the relation There is loss of information as extraneous tuples
obtained after natural join of decompositions is are added into the relation after natural join of
equivalent to original relation. Thus, it is also decompositions. Thus, it is also referred to as
referred to as non-additive join decomposition careless decomposition.
Let there be a relational schema Student(Roll No., S_name, S_dept). StudentDetails(Roll No., S_name)
and Dept(Roll No., S_dept) be it’s decompositions.
Decompose
into
&
Roll No. S_dept
1 CSE
2 Quantum Computing
Example-2:
Example to check whether given Decomposition Lossy Join Decomposition.
Let there be a relational schema Student(Roll No., S_name, S_dept). StudentDetails(Roll No., S_name)
and Dept(S_name, S_dept) be it’s decompositions.
Decompose
into
&
S_name S_dept
Raju CSE
Raju Quantum Computing
What is NULL ?
In Structured Query Language Null Or NULL is a special type of marker which is used to tell us about
that a data value does not present in the database. In Structured Query Language (SQL) Null is a
predefined word which is used to identity this marker. It is very important to understand that a NULL
value is totally different than a zero value.
In other words we can say that a NULL attribute value is equivalent of nothing that means in database
there is an attribute that has a value which indicates nothing or Null, An attributes does not exist or we
can say that it is missing . In database a Null value in tables is that value in the fields that appears to be
blank. It is a field that has no value.
Now we can use IS NOT NULL operator and write a query which is as following.
SQL> SELECT *
FROM CUSTOMERS
WHERE SALARY IS NOT NULL;
Here we can see that in CUSTOMERS table , ID no. 6 and 7 which is named as NAMAN and AYUSH
and their salary column is empty and in other words it is Null . That’s why after query execution it
would produce a table where these two names NAMAN and AYUSH not present because we use IS
NOT NULL operator.
In DBMS if there is a tuple that does not participate in a natural join we called it as dangling
tuple . It may gives indication consistency problem in the database.
Another definition of dangling problem tuple is that a tuple with a foreign key value
that not appear in the referenced relation is known as dangling tuple. In DBMS
Referential integrity constraints specify us exactly when dangling tuples indicate
problem.
ID NAME PHONE_NUMBER
1 Johnny 4567823
2 Olly 7486513
3 Ilenia 3481365
4 Luise 1685364
You could use simple query to get a table with the pet name and the owner name next to each other.
Let's do it with all the different JOIN operators.
In this case you would SELECT the column name from the pets table (and rename it pet_name). Then
you would select the name column from the owners table, and rename it owner. That would look like
this: SELECT pets.name AS pet_name, owners.name AS owner.
You would use FROM to say that the columns are from the pets table, and JOIN to say that you want to
join it with the owners table, using this syntax: FROM pets JOIN owner.
And finally you would say that you want to join two rows together when the owner_id column in the
pets table is equal to the id column in the owner table with ON pets.owner_id = owners.id.
PET_NAME OWNER
Fido Johnny
Missy Johnny
Sissy Olly
Copper Ilenia
Problem here is that Hopper pet is missing as it does not have owner
Let's do the same query using LEFT JOIN so you can see the difference. The query is the same other
than adding the LEFT keyword.
PET_NAME OWNER
Fido Johnny
Missy Johnny
Sissy Olly
Copper Ilenia
Hopper NULL
Hopper pet is available without owner as it does not have such one
(It seems there is a pet that is not registered with an owner.)
PET_NAME OWNER
Fido Johnny
Missy Johnny
Sissy Olly
Copper Ilenia
NULL Louise
In this case all the rows from the right of table, owners, are kept, and if there is a missing value, in left
side it is filled with NULL.
PET_NAME OWNER
Fido Johnny
Missy Johnny
Sissy Olly
Copper Ilenia
Hopper NULL
NULL Louise
It seems that there is a pet without an owner and an owner without a pet in our database.
Dangling tuple:
Dangling tuple = a tuple in a relation that does not join with any tuple in the other relation
Example:
Conclusion:
Equivalence :
Two FDs F and G sets over schema R are equivalent if F+ = G+. It means that if every functional
dependency of F is in G+ and every functional dependence of G is in F+, then we would say that
the sets of functional dependencies F and G are equivalent.
Q.1 Let us take an example to show the relationship between two FD sets. A relation R(A,B,C,D)
having two FD sets FD1 = {A->B, B->C, AB->D} and FD2 = {A->B, B->C, A->C, A->D}
Step 3: As FD2 ⊃ FD1 and FD1 ⊃ FD2 both are true FD2 =FD1 is true. These two FD sets are
semantically equivalent.
Q.2 Let us take another example to show the relationship between two FD sets. A relation
R2(A,B,C,D) having two FD sets FD1 = {A->B, B->C,A->C} and FD2 = {A->B, B->C, A->D}
As all FDs in set FD1 also hold in set FD2, FD2 ⊃ FD1 is true.
Step 3: In this case, FD2 ⊃ FD1 and FD2 ⊄ FD1, these two FD sets are not semantically
equivalent.
Minimal Cover
If we have a set of functional dependencies, we get the simplest and irreducible form of functional
dependencies after reducing these functional dependencies. This is called the Minimal Cover or
Irreducible Set (as we can’t reduce the set further). It is also called a Canonical Cover.
https://www.nielit.gov.in/gorakhpur/sites/default/files/Gorakhpur/
Alevel_1_DBMS_05May2020_AV.pdf
Let us understand the procedure to find the minimal cover by this example:
Step: 1 First split the all right side attributes of all FDs as single (functional dependencies).
(A→B), We have to find the closure of A ie. A+ using rest of the following FDs, If we have B in
the A+, then we can remove A→ B as it is redundant or we have to keep it
A+ = {A} (A is only closure contains to A, simply we can derive A from A (reflexive property)
So, A->B is not redundant. We have to keep it
Similarly for C->B We have to find the Closure of C (C+) using rest of the FDs except C→ B
C+= {C} since B is not in the closure set of C , C → B not redundant, we can take it up
Similarly for D->A , We have to find the Closure of D (D+) using rest of the FDs except D→A
D+={D,B,C}
D+= {D,B,C} since A is not in the closure set of D , D→A is not redundant, we can take it up
Similarly for D->B , We have to find the Closure of D (D+) using rest of the FDs except D→B
D+={ D,A,B,C}
Similarly for D->C , We have to find the Closure of D (D+) using rest of the FDs except D→C
A->B, C->B, D->A, AC→D
D+={ D,A,B}
Check if AC+ and C+ are same , it implies A is extraneous and we can remove it
Check if AC+ and A+ are same , it implies C is extraneous and we can remove it
AC+= { ACB}
A+= {AB}
C+={CB}
AC+ !=C+ and AC+ != A+ , Therefore AC→ D is not redundant and can be taken
Result :
Hence, we can write it as A→ B, C->B, D->AC, AC->D this is the minimum cover.
We must carefully consider the problems associated with NULLs when designing a relational database
schema. There is no fully satisfactory relational design theory as yet that includes NULL values.
One problem occurs when some tuples have NULL values for attributes that will be used to join
individual relations in the decomposition.
To illustrate this, consider the database shown in Figure 16.2(a), where two relations EMPLOYEE and
DEPARTMENT are shown. The last two employee tuples— ‘Berger’ and ‘Benitez’—represent
newly hired employees who have not yet been assigned to a department (assume that this does not
violate any integrity constraints). Now suppose that we want to retrieve a list of (Ename, Dname)
values for all the employees. If we apply the NATURAL JOIN operation on EMPLOYEE and
DEPARTMENT (Figure 16.2(b)), the two aforementioned tuples will not appear in the result. The
OUTER JOIN operation, can deal with this problem. Recall that if we take the LEFT OUTER JOIN
of EMPLOYEE with DEPARTMENT, tuples in EMPLOYEE that have NULL for the join attribute will
still appear in the result, joined with an imaginary tuple in DEPARTMENT that has NULLs for all its
attribute values. Figure 16.2(c) shows the result.
It can be rectified by alternate method , when Employee_1 is combined with Employee_3 , it does
not generate dangling tuples by natural join as shown in fig below
Further discussion of Multivalued dependencies and 4NF
A Nontrivial MVD occurs when X->->Y and X->->Z where Y and Z are not dependent are
independent to each other. Non-trivial MVD produces redundancy.
For example, consider a table called "Students". It has columns: "Student ID," "Course," and
"Textbook." Each student can take multiple courses, and each course may require multiple textbooks.
Therefore, the "Course" and "Textbook" columns are multivalued attributes.
In this example, we can note that there is a relationship between the "Course" and "Textbook" columns.
The "Course" column determines which textbooks are needed. For example, a student taking "Math"
will need both "Algebra" and "Calculus" textbooks. This relationship between "Course" and
"Textbook" is a multivalued dependency.
Course →→ Textbook
This indicates that for any given value of "Course," there is a set of corresponding values of
"Textbook." For example, if we know that a student is taking "Math," we can infer that the student
needs both "Algebra" and "Calculus" textbooks.
MVD occurs when a table has a non-trivial relationship between attributes that are not part of the
same composite key. In the example above, the "Course" and "Textbook" columns have a multivalued
dependency because the "Course" column determines which textbooks are needed.
Fourth Normal Form (4NF) is a level of database normalization that requires a relation to be in BCNF
and have no non-trivial multivalued dependencies other than the candidate key, to eliminate redundant
data and maintain data consistency. If a table violates this standard, it needs to be split into two tables
to achieve 4NF.
To remove the multivalued dependency (MVD) in the "Students" table example, we can create two new
tables, one for "Courses" and another for "Textbooks," and establish a relationship between them using
foreign keys.
Table 1: Students
Table 2: Courses
Table 3: Textbooks
So, we removed the multivalued dependency by splitting the "Course" and "Textbook" columns into
separate tables.
We have also added a new "Course ID" column to the "Students" table. It has a foreign key that
references the "Course ID" column in the "Courses" table. Similarly, the "Textbooks" table also has a
"Course ID" column that serves as a foreign key referencing the "Course ID" column in the "Courses"
table.
Hence, we have achieved the fourth normal form (4NF) for the "Students" table. It has done after by
removing the multivalued dependency and creating separate tables. The Resultant schema eliminates
data redundancy and improves data integrity, making it easier to manage and query the database.