0% found this document useful (0 votes)
3 views

Normalization Unit2

Normalization is a database organization process aimed at reducing redundancy and eliminating anomalies such as insertion, update, and deletion issues. It involves decomposing larger tables into smaller ones while ensuring logical data dependencies. The document outlines the types of normal forms (1NF, 2NF, 3NF, 4NF, 5NF) and Codd's twelve rules for relational databases.
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Normalization Unit2

Normalization is a database organization process aimed at reducing redundancy and eliminating anomalies such as insertion, update, and deletion issues. It involves decomposing larger tables into smaller ones while ensuring logical data dependencies. The document outlines the types of normal forms (1NF, 2NF, 3NF, 4NF, 5NF) and Codd's twelve rules for relational databases.
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 9

Normalization

o Normalization is the process of organizing the data in the database.


o Normalization is used to minimize the redundancy from a relation or set of
relations. It is also used to eliminate the undesirable characteristics like Insertion,
Update and Deletion Anomalies.
o Normalization divides the larger table into the smaller table and links them using
relationship.
o The normal form is used to reduce redundancy from the database table.

Database Normalization is a technique of organizing the data in the database.


Normalization is a systematic approach of decomposing tables to eliminate data
redundancy(repetition) and undesirable characteristics like Insertion, Update and
Deletion Anomalies. It is a multi-step process that puts data into tabular form,
removing duplicated data from the relation tables.
Normalization is used for mainly two purposes,

 Eliminating redundant(useless) data.


 Ensuring data dependencies make sense i.e data is logically stored.

Problems Without Normalization


If a table is not properly normalized and have data redundancy then it will not only
eat up extra memory space but will also make it difficult to handle and update the
database, without facing data loss. Insertion, Updation and Deletion Anomalies are
very frequent if database is not normalized. To understand these anomalies let us
take an example of a Student table.

Rollno name branch hod office_tel

401 Akon CSE Mr. X 53337

402 Bkon CSE Mr. X 53337

403 Ckon CSE Mr. X 53337


404 Dkon CSE Mr. X 53337

In the table above, we have data of 4 Computer Sci. students. As we can see, data
for the fields branch, hod(Head of Department) and office_tel is repeated for the
students who are in the same branch in the college, this is Data Redundancy.
Insertion Anomaly
Suppose for a new admission, until and unless a student opts for a branch, data of
the student cannot be inserted, or else we will have to set the branch information
as NULL.
Also, if we have to insert data of 100 students of same branch, then the branch
information will be repeated for all those 100 students.
These scenarios are nothing but Insertion anomalies.

Updation Anomaly
What if Mr. X leaves the college? or is no longer the HOD of computer science
department? In that case all the student records will have to be updated, and if by
mistake we miss any record, it will lead to data inconsistency. This is Updation
anomaly.

Deletion Anomaly
In our Student table, two different informations are kept together, Student
information and Branch information. Hence, at the end of the academic year, if
student records are deleted, we will also lose the branch information. This is Deletion
anomaly.

Types of Normal Forms


There are the four types of normal forms:

Normal Description
Form

1NF A relation is in 1NF if it contains an atomic value.

2NF A relation will be in 2NF if it is in 1NF and all non-key attributes are fully
functional dependent on the primary key.

3NF A relation will be in 3NF if it is in 2NF and no transition dependency exists.

4NF A relation will be in 4NF if it is in Boyce Codd normal form and has no multi-
valued dependency.

5NF A relation is in 5NF if it is in 4NF and not contains any join dependency and
joining should be lossless.

First Normal Form (1NF)


o A relation will be 1NF if it contains an atomic value.
o It states that an attribute of a table cannot hold multiple values. It must hold only
single-valued attribute.
o First normal form disallows the multi-valued attribute, composite attribute, and
their combinations.

Example: Relation EMPLOYEE is not in 1NF because of multi-valued attribute


EMP_PHONE.

EMP_ID EMP_NAME EMP_PHONE EMP_STATE

14 John 7272826385, UP
9064738238

20 Harry 8574783832 Bihar


TEACHER_ID SUBJECT TEACHER_AGE

25 Chemistry 30

25 Biology 30

47 English 35

83 Math 38

83 Computer 38

12 Sam 7390372389, Punjab


8589830302

EMPLOYEE table:

The decomposition of the EMPLOYEE table into 1NF has been shown below:

EMP_ID EMP_NAME EMP_PHONE EMP_STATE

14 John 7272826385 UP

14 John 9064738238 UP

20 Harry 8574783832 Bihar

12 Sam 7390372389 Punjab

12 Sam 8589830302 Punjab

Second Normal Form (2NF)


o In the 2NF, relational must be in 1NF.
o In the second normal form, all non-key attributes are fully functional dependent
on the primary key

Example: Let's assume, a school can store the data of teachers and the subjects they
teach. In a school, a teacher can teach more than one subject.

TEACHER table
In the given table, non-prime attribute TEACHER_AGE is dependent on TEACHER_ID which
is a proper subset of a candidate key. That's why it violates the rule for 2NF.

To convert the given table into 2NF, we decompose it into two tables:

TEACHER_DETAIL table:

TEACHER_ID TEACHER_AGE

25 30

47 35

83 38

TEACHER_SUBJECT table:

TEACHER_ID SUBJECT

25 Chemistry

25 Biology

47 English

83 Math

83 Computer
Third Normal Form (3NF)
o A relation will be in 3NF if it is in 2NF and not contain any transitive partial
dependency.
o 3NF is used to reduce the data duplication. It is also used to achieve the data
integrity.
o If there is no transitive dependency for non-prime attributes, then the relation
must be in third normal form.

A relation is in third normal form if it holds atleast one of the following conditions for
every non-trivial function dependency X → Y.

1. X is a super key.
2. Y is a prime attribute, i.e., each element of Y is part of some candidate key.

Example:

EMPLOYEE_DETAIL table:

EMP_ID EMP_NAME EMP_ZIP EMP_STATE EMP_CITY

222 Harry 201010 UP Noida

333 Stephan 02228 US Boston

444 Lan 60007 US Chicago

555 Katharine 06389 UK Norwich

666 John 462007 MP Bhopal

Super key in the table above:

1. {EMP_ID}, {EMP_ID, EMP_NAME}, {EMP_ID, EMP_NAME, EMP_ZIP}....so on

Candidate key: {EMP_ID}

Non-prime attributes: In the given table, all attributes except EMP_ID are non-
prime.

Here, EMP_STATE & EMP_CITY dependent on EMP_ZIP and EMP_ZIP dependent on


EMP_ID. The non-prime attributes (EMP_STATE, EMP_CITY) transitively dependent
on super key(EMP_ID). It violates the rule of third normal form.

That's why we need to move the EMP_CITY and EMP_STATE to the new
<EMPLOYEE_ZIP> table, with EMP_ZIP as a Primary key.
EMPLOYEE table:

EMP_ID EMP_NAME EMP_Z

222 Harry 20101

333 Stephan 02228

444 Lan 60007

555 Katharine 06389

666 John 46200

EMPLOYEE_ZIP table:

EMP_ZIP EMP_STATE EMP_CITY

201010 UP Noida

02228 US Boston

60007 US Chicago

06389 UK Norwich

462007 MP Bhopal

CODD’ Rule

Dr Edgar F. Codd, after his extensive research on the Relational Model of database
systems, came up with twelve rules of his own, which according to him, a database
must obey in order to be regarded as a true relational database.
These rules can be applied on any database system that manages stored data
using only its relational capabilities. This is a foundation rule, which acts as a base
for all the other rules.
Rule 1: Information Rule
The data stored in a database, may it be user data or metadata, must be a value of
some table cell. Everything in a database must be stored in a table format.

Rule 2: Guaranteed Access Rule


Every single data element (value) is guaranteed to be accessible logically with a
combination of table-name, primary-key (row value), and attribute-name (column
value). No other means, such as pointers, can be used to access data.

Rule 3: Systematic Treatment of NULL Values


The NULL values in a database must be given a systematic and uniform treatment.
This is a very important rule because a NULL can be interpreted as one the
following − data is missing, data is not known, or data is not applicable.

Rule 4: Active Online Catalog


The structure description of the entire database must be stored in an online catalog,
known as data dictionary, which can be accessed by authorized users. Users can
use the same query language to access the catalog which they use to access the
database itself.

Rule 5: Comprehensive Data Sub-Language Rule


A database can only be accessed using a language having linear syntax that
supports data definition, data manipulation, and transaction management
operations. This language can be used directly or by means of some application. If
the database allows access to data without any help of this language, then it is
considered as a violation.

Rule 6: View Updating Rule


All the views of a database, which can theoretically be updated, must also be
updatable by the system.

Rule 7: High-Level Insert, Update, and Delete Rule


A database must support high-level insertion, updation, and deletion. This must not
be limited to a single row, that is, it must also support union, intersection and minus
operations to yield sets of data records.

Rule 8: Physical Data Independence


The data stored in a database must be independent of the applications that access
the database. Any change in the physical structure of a database must not have any
impact on how the data is being accessed by external applications.

Rule 9: Logical Data Independence


The logical data in a database must be independent of its user’s view (application).
Any change in logical data must not affect the applications using it. For example, if
two tables are merged or one is split into two different tables, there should be no
impact or change on the user application. This is one of the most difficult rule to
apply.

Rule 10: Integrity Independence


A database must be independent of the application that uses it. All its integrity
constraints can be independently modified without the need of any change in the
application. This rule makes a database independent of the front-end application
and its interface.

Rule 11: Distribution Independence


The end-user must not be able to see that the data is distributed over various
locations. Users should always get the impression that the data is located at one
site only. This rule has been regarded as the foundation of distributed database
systems.

Rule 12: Non-Subversion Rule


If a system has an interface that provides access to low-level records, then the
interface must not be able to subvert the system and bypass security and integrity
constraints.

You might also like