Unit Two
Unit Two
This unit is developed to provide you the necessary information regarding the following
content coverage and topics
Insert Anomaly
For example, if we wanted to add a new student but did not know their course name this will be how
Update Anomaly
For example, let’s say the class Biology 1 was changed to “Intro to Biology”. We would have to query all of the
columns that could have this Class field and rename each one that was found.
Delete Anomaly
For example, let’s say Susan Johnson quits and her record needs to be deleted from the system. We could delete her
row:
Normalization stages
1NF - First normal form
2NF - Second normal form
3NF - Third normal form
3.5NF - Boyce Codd Normal Form (BCNF)
4NF - Fourth normal form
5NF - Fifth normal form
The Normal Forms
The database community has developed a series of guidelines for ensuring that databases are normalized. These are
referred to as normal forms and are numbered from one (the lowest form of normalization, referred to as First normal
form or 1NF) through five (fifth normal form or 5NF). In practical applications, you'll often see 1NF, 2NF, and 3NF
along with the occasional 4NF. Fifth normal form is very rarely seen.
The above table Student Details, Course Details and Result Details can be further divided.
Student Details attribute is divided into Student#(Student Number), Student Name and date of birth.
Course Details is divided into Course#, Course Name and duration.
Results attribute is divided into Date ofexam, Marks and Grade.
II. Second normal form (2NF): Eliminating Redundant Data
Second normal form (2NF) requires that all non-key columns are fully dependent on the entire primary key.
If the table has only a single-column primary key, this requirement is easily met.
At this level of normalization, each column in a table that is not a determiner of the contents of another
column must itself be a function of the other columns in the table. For example, in a table with three columns
containing customer ID, product sold, and price of the product when sold, the price would be a function of
the customer ID (entitled to a discount) and the specific product.
In order to perform first normalizationrule, we have to consider the following concepts
Meet all the requirements of the first normal form and remove subsets of data that apply to multiple rows of
a table and place them in separate tables.
Create relationships between these new tables and their predecessors through the use of foreign keys.
Example 2
Let us re-visit 1NF table structure.
Student# is key attribute for Student,
Course# is key attribute for Course
Student# and Course# together form the composite key attributes for result relationship.
Other attributes are non - key attributes.
To make this table 2NF compliant, we have to remove all the partial dependencies.
Student Name and Date ofBirth depends only on student#.
CourseName, Pre-Requisite and DurationInDays depends only on Course#
Date ofExam depends only on Course#.
To remove this partial dependency, we need to split Student_Course_Result table into four separate tables,
STUDENT, COURSE, RESULT and EXAM_DATE tables as shown in the following:
STUDENT TABLE
Student # Student Name DateofBirth
1001 Ram Some value
1002 Shyam Some value
1003 Sita Some value
1004 Geeta Some value
1005 Sunita Some value
COURSE TABLE
Course# CourseName Duration of days
C3 Bio Chemistry 3
B3 Botany 8
P3 Nuclear Physics 1
M4 Applied Mathematics 4
H6 American History 5
B4 Zoology 9
RESULT TABLE
Student# Course# Marks Grade
1001 M4 89 A
1002 M4 78 B
1001 H6 87 A
1003 C3 90 A
1004 B3 78 B
1002 P3 67 C
1005 P3 78 B
1003 B4 67 C
1005 H6 56 D
1004 M4 78 B
M4 Some value
H6 Some value
C3 Some value
B3 Some value
P3 Some value
B4 Some value
In STUDENT table, the key attribute is Student# and all other non-key attributes, Student name and Date
ofBirth are fully functionally dependent on the key attribute.
In COURSE table, the key attribute is Course# and all the non-key attributes, Course name, Duration in days
are fully functional dependent on the key attribute.
In RESULT table, the key attributes are #StudentCourse# together and all other non-key attributes, Marks and
Grade are fully functional dependent on the key attributes.
In EXAM DATE table, the key attribute is Course# and the non key attribute Date ofExam is fully
functionally dependent on the key attribute.
At first look it appears like all our anomalies are taken away! Now we are storing Student 1003 and M4 record
only once. We can insert prospective students and courses at our will. We will update only once if we need to
change any data in STUDENT, COURSE tables. We can get rid of any course or student details by deleting just
one row.
III. Third normal form (3NF): Eliminating Columns Not Dependent on Keys
Requires that there are no transitive dependencies, where one column depends on another column which depends on
the primary key.At the 2NF, modifications are still possible because a change to one row in a table may affect data
that refers to this information from another table. For example, using the customer table just cited, removing a row
describing a customer purchase (because of a return perhaps) will also remove the fact that the product has a certain
price. In the third normal form, these tables would be divided into two tables so that product pricing would be tracked
separately.
In order to perform first normalizationrule, we have to consider the following concepts
Meet all the requirements of the second normal form.
Remove columns that are not dependent upon the primary key.
No transitive dependency exists between non-key attributes and key attributes.
Example 3:In the above RESULT table Student# and Course# are the key attributes. All other attributes, except
grade are non-partially, non-transitively dependent on key attributes. The grade attribute is dependent on “Marks “,
and in turn “Marks” is dependent on #Student#Course. To bring the table in 3NF, we need to take off this transitive
dependency.
1001 M4 89
1002 M4 78
1001 H6 87
1003 C3 90
1004 B3 78
1002 P3 67
1005 P3 78
1003 B4 67
1005 H6 56
1004 M4 78
100 95 A+
94 90 A
89 85 B+
84 80 B
79 75 B-
74 70 C
69 65 C-
After normalizing tables to 3NF, we got rid of all the anomalies and inconsistencies. Now we can add new grade
systems, update the existing one and delete the unwanted ones. Hence the Third Normal form is the most optimal
normal form and 99% of the databases which require efficiency in
INSERT, UPDATE and DELETE Operations are designed in this normal form
IV. Boyce-Codd Normal Form (BCNF or 3.5NF)
The Boyce-Codd Normal form, also referred to as the "third and half (3.5) normal form", adds one more
requirement:
Meet all the requirements of the third normal form.
Every determinant must be a candidate key.
Boyce Codd Normal Form (BCNF) is a further refinement of 3NF. A row is in Boyce Codd normal form if
and only if every determinant is a candidate key.
Most entities in 3NF are already in BCNF.
V. Fourth Normal Form (4NF)
Fourth normal form (4NF) has one additional requirement:
Meet all the requirements of the third normal form.
A relation is in 4NF if it has no multi-valued dependencies.
An entity is in Fourth Normal Form (4NF) if and only if it is in 3NF and has no multiple sets of multi-valued
dependencies. In other words, 4NF states that no entity can have more than a single one-to-many relationship
within an entity if the one-to-many attributes are independent of each other.
VI. Fifth Normal Form (5NF)
5NF specifies that every join dependency for the entity must be a consequence of its candidate keys.
Partial Dependency
Partial dependency occurs when an attribute is functionally dependent on only a part of the primary key, rather than
the entire primary key. It means that a non-key attribute depends on only a subset of the primary key, and not on the
entire primary key.
Example: In a table called "Orders," if the primary key is "OrderID," and the attributes "CustomerName" and
"CustomerAddress" depend on only the attribute "OrderID," it indicates a partial dependency. This can be
represented as: Ordered ->CustomerName, CustomerAddress. To remove the partial dependency, the table can be
split into two separate tables: "Orders" and "Customers," where the customer details are stored separately.
Transitive Dependency
Transitive dependency occurs when an attribute depends on another non-key attribute, rather than directly depending
on the primary key. It means that the dependency is indirectly established through another attribute.
Example: In a table called "Employees," if the primary key is "EmployeeID," and the attributes "Department" and
"DepartmentLocation" depend on each other, it indicates a transitive dependency. This can be represented as:
EmployeeID -> Department -> DepartmentLocation. To remove the transitive dependency, the table can be split into
two separate tables: "Employees" and "Departments," where the department details are stored separately.
Identifying and eliminating partial and transitive dependencies are crucial in achieving higher levels of normalization
(such as 3NF or BCNF) to ensure data integrity, reduce redundancy, and avoid anomalies in a database.
1. Analyzing the data: The first step is to analyze the existing data in the database. This involves identifying the
various entities, attributes, and relationships between them.
2. Applying normalization rules: Next, the data is normalized by applying normalization rules, specifically the rules
outlined in normal forms, such as First Normal Form (1NF), Second Normal Form (2NF), and so on. Each
normalization form has specific criteria that need to be met.
3. Breaking down tables: In order to meet the criteria for normalization forms, it may be necessary to break down
existing tables into multiple tables, with each table focusing on a specific entity or relationship.
4. Resolving dependencies: During the normalization process, dependencies between attributes are identified and
resolved. This includes identifying and eliminating partial dependencies and transitive dependencies.
5. Documenting the results: Once the normalization process is complete, it is important to document the results.
This documentation includes the structure of the normalized tables, the relationships between them, and any
changes made to the original data model.
By undertaking normalization of business data and documenting the results, organizations can ensure that their
databases are efficiently structured, leading to improved data quality, easier data maintenance, and more effective
data operations.
Additionally, we would verify if the attributes in the normalized tables align with the attributes specified in the ER
diagram. We would ensure that no redundant data exists and that the data is properly organized according to
normalization rules.
By comparing the normalization results with the ER diagram, we can validate the accuracy and consistency of the
database design, ensuring that the normalized tables effectively capture the structure and relationships depicted in
the ER diagram.
2.5. Reconcile differences between data
Reconciling differences between data in normalization refers to the process of resolving conflicts or inconsistencies
that may arise during the normalization process. These conflicts can occur when attempting to organize and structure
data into normalized tables, especially when there are dependencies or relationships between attributes and entities.
Here are some key points to understand about reconciling differences between data in normalization
1. Identify Inconsistencies: The first step is to identify any inconsistencies or conflicts within the data. This may
involve analyzing the relationships, dependencies, and functional dependencies between attributes and entities.
2. Analyze Dependencies: Evaluate the dependencies between attributes and entities to determine if they are
accurately represented. This includes identifying partial dependencies (where an attribute depends on only a part
of the primary key) or transitive dependencies (where an attribute depends on another non-key attribute).
3. Normalize Data: Apply normalization rules, such as First Normal Form (1NF), Second Normal Form (2NF), and
so on, to organize the data into well-structured tables. This may involve breaking down tables, creating separate
tables for related entities, and defining appropriate primary and foreign keys.
4. Resolve Conflicts: Address any conflicts or inconsistencies that arise during the normalization process. This may
involve making decisions on how to handle partial dependencies or transitive dependencies. One approach is to
split tables and create additional tables to ensure that data is properly organized and dependencies are accurately
represented.
5. Ensure Data Integrity: As you reconcile differences, it is crucial to maintain data integrity. This means that the
data in the normalized tables should accurately represent the relationships and dependencies between entities. It
also involves ensuring that there are no duplicate or redundant data.
6. Validate Results: Validate the results of the normalization process to ensure that the reconciled data aligns with
the intended structure and relationships. This can be done by comparing the normalized tables with the original
data, verifying that the relationships and dependencies are accurately represented.
Reconciling differences between data in normalization is an essential step in achieving a well-structured and efficient
database design. It helps eliminate redundancy, reduce anomalies, and improve data integrity.