DBMS Unit-III (1)
DBMS Unit-III (1)
A large database defined as a single relation may result in data duplication. This repetition of data
may result in:
So to handle these problems, we should analyze and decompose the relations with redundant data
into smaller, simpler, and well-structured relations that are satisfy desirable properties.
Normalization is a process of decomposing the relations into relations with fewer attributes.
What is Normalization?
Normalization is the process of organizing the data in the database.
Normalization is used to minimize the redundancy from a relation or set of relations. It is also
used to eliminate undesirable characteristics like Insertion, Update, and Deletion Anomalies.
Normalization divides the larger table into smaller and links them using relationships.
The normal form is used to reduce redundancy from the database table.
Why do we need Normalization?
The main reason for normalizing the relations is removing these anomalies. Failure to eliminate
anomalies leads to data redundancy and can cause data integrity and other problems as the database
grows. Normalization consists of a series of guidelines that helps to guide you in creating a good
database structure.
Insertion Anomaly: Insertion Anomaly refers to when one cannot insert a new tuple into a
relationship due to lack of data.
Deletion Anomaly: The delete anomaly refers to the situation where the deletion of data results
in the unintended loss of some other important data.
Updatation Normal Form
Anomaly: Description
The update anomaly is when an update of a single data value requires
multiple rows
1NF of data to be updated.
A relation is in 1NF if it contains an atomic value.
2NF A relation will be in 2NF if it is in 1NF and all non-key attributes are fully functional dependent on
the primary key.
3NF A relation will be in 3NF if it is in 2NF and no transition dependency exists.
BCNF A stronger definition of 3NF is known as Boyce Codd's normal form.
4NF A relation will be in 4NF if it is in Boyce Codd's normal form and has no multi-valued dependency.
5NF A relation is in 5NF. If it is in 4NF and does not contain any join dependency, joining should be
lossless.
Advantages of Normalization
Disadvantages of Normalization
You cannot start building the database before knowing what the user needs.
The performance degrades when normalizing the relations to higher normal forms, i.e., 4NF, 5NF.
It is very time-consuming and difficult to normalize relations of a higher degree.
Careless decomposition may lead to a bad database design, leading to serious problems.
First Normal Form (1NF)
A relation will be 1NF if it contains an atomic value.
It states that an attribute of a table cannot hold multiple values. It must hold only single-valued
attribute.
First normal form disallows the multi-valued attribute, composite attribute, and their combinations.
EMPLOYEE table:
If we follow second normal form, then every non-prime attribute should be fully functionally
dependent on prime key attribute. That is, if X → A holds, then there should not be any proper
subset Y of X, for which Y → A also holds true.
TEACHER_DETAIL table:
TEACHER_ID TEACHER_AGE
25 30
47 35
83 38
TEACHER_SUBJECT table:
EACHER_ID SUBJECT
25 Chemistry
25 Biology
47 English
83 Math
83 Computer
Third Normal Form (3NF)
A relation will be in 3NF if it is in 2NF and not contain any transitive partial dependency.
3NF is used to reduce the data duplication. It is also used to achieve the data integrity.
If there is no transitive dependency for non-prime attributes, then the relation must be in third
normal form.
A relation is in third normal form if it holds at least one of the following conditions for every non-trivial
function dependency
X → Y.
X is a super key.
EMPLOYEE_DETAIL table:
Y is a prime attribute, i.e., each element of Y is part of some candidate key.
EMP_ID EMP_NAME EMP_ZIP EMP_STATE EMP_CITY
222 Harry 201010 UP Noida
333 Stephan 02228 US Boston
444 Lan 60007 US Chicago
555 Katharine 06389 UK Norwich
666 John 462007 MP Bhopal
Super key in the table above:
{EMP_ID}, {EMP_ID, EMP_NAME}, {EMP_ID, EMP_NAME, EMP_ZIP}....so on
Non-prime attributes: In the given table, all attributes except EMP_ID are non-prime.
Here, EMP_STATE & EMP_CITY dependent on EMP_ZIP and EMP_ZIP dependent on EMP_ID. The
non-prime attributes (EMP_STATE, EMP_CITY) transitively dependent on super key(EMP_ID). It
violates the rule of third normal form.
That's why we need to move the EMP_CITY and EMP_STATE to the new <EMPLOYEE_ZIP> table,
with EMP_ZIP as a Primary key.
EMPLOYEE table: EMPLOYEE_ZIP table:
EMP_ID EMP_NAME EMP_ZIP EMP_ZIP EMP_STATE EMP_CITY
222 Harry 201010 201010 UP Noida
333 Stephan 02228 02228 US Boston
444 Lan 60007 60007 US Chicago
555 Katharine 06389 06389 UK Norwich
666 John 462007 462007 MP Bhopal
Boyce Codd normal form (BCNF)
BCNF is the advance version of 3NF. It is stricter than 3NF.
A table is in BCNF if every functional dependency X → Y, X is the super key of the table.
For BCNF, the table should be in 3NF, and for every FD, LHS is super key.
Example: Let's assume there is a company where employees work in more than one department.
EMPLOYEE table:
EMP_ID EMP_COUNTRY EMP_DEPT DEPT_TYPE EMP_DEPT_NO
264 India Designing D394 283
264 India Testing D394 300
364 UK Stores D283 232
364 UK Developing D283 549
In
1. the above table Functional dependencies are as follows:
EMP_ID → EMP_COUNTRY
EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}
The table is not in BCNF because neither EMP_DEPT nor EMP_ID alone are keys.
To convert the given table into BCNF, we decompose it into three tables:
Now, this is in BCNF because left side part of both the functional dependencies is a
key.
Functional Dependency
The functional dependency is a relationship that exists between two attributes. It typically exists between the primary key and
non-key attribute within a table. Functional dependency in DBMS, as the name suggests is a relationship
between attributes of a table dependent on each other. Introduced by E. F. Codd, it helps in
preventing data redundancy and gets to know about bad designs.
To understand the concept thoroughly, let us consider P is a relation with attributes X and Y.
Functional Dependency is represented by -> (arrow sign)
Then the following will represent the functional dependency between attributes with an arrow sign −
X Y
The left side of FD is known as a determinant, the right side of the production is known as a
dependent.
Assume we have an employee table with attributes: Emp_Id, Emp_Name, Emp_Address.
Here Emp_Id attribute can uniquely identify the Emp_Name attribute of employee table because if we
know the Emp_Id, we can tell that employee name associated with it.
Emp_Id Emp_Name
Example
We are considering the same <Department> table with two attributes to understand the
concept of trivial dependency.
The following is a trivial functional dependency since DeptId is a subset
of DeptId and DeptName
Example
The above is a non-trivial functional dependency since DeptName is a not a subset of DeptId.
X = {a, b, c, d, e}
Y = {a, b, c}
If X → Y then XZ → YZ
Example
In the transitive rule, if X determines Y and Y determine Z, then X must also determine Z.
If X → Y and Y → Z then X → Z
Union rule says, if X determines Y and X determines Z, then X must also determine Y and Z.
If X → Y and X → Z then X → YZ
Decomposition rule is also known as project rule. It is the reverse of union rule.
This Rule says, if X determines Y and Z, then X determines Y and X determines Z separately.
If X → YZ then X → Y and X → Z
If X → Y and YZ → W then XZ → W
Relation Data Model
Relational data model is the primary data model, which is used widely around the world for data
storage and processing. This model is simple and it has all the properties and capabilities required to
process data with storage efficiency.
Concepts
Tables − In relational data model, relations are saved in the format of Tables. This format stores the relation among entities. A
table has rows and columns, where rows represents records and columns represent the attributes.
Tuple − A single row of a table, which contains a single record for that relation is called a tuple.
Relation instance − A finite set of tuples in the relational database system represents relation instance. Relation instances do
not have duplicate tuples.
Relation schema − A relation schema describes the relation name (table name), attributes, and their names.
Relation key − Each row has one or more attributes, known as relation key, which can identify the row in the relation (table)
uniquely.
Attribute domain − Every attribute has some pre-defined value scope, known as attribute domain.
Constraints
Every relation has some conditions that must hold for it to be a valid relation. These conditions are called Relational Integrity
Constraints. There are three main integrity constraints −
•Key constraints
•Domain constraints
•Referential integrity constraints
Key Constraints
There must be at least one minimal subset of attributes in the relation, which can identify a tuple uniquely. This minimal subset
of attributes is called key for that relation. If there are more than one such minimal subsets, these are called candidate keys.
•in a relation with a key attribute, no two tuples can have identical values for key attributes.
•a key attribute can not have NULL values.
Attributes have specific values in real-world scenario. For example, age can only be a positive integer. The same constraints
have been tried to employ on the attributes of a relation. Every attribute is bound to have a specific range of values. For
example, age cannot be less than zero and telephone numbers cannot contain a digit outside 0-9.
Referential integrity constraints work on the concept of Foreign Keys. A foreign key is a key attribute of a relation that can be
referred in other relation.
Referential integrity constraint states that if a relation refers to a key attribute of a different or same relation, then that key
element must exist.
Features of a relational database
Relational databases need ACID characteristics.
ACID refers to four essential properties: Atomicity, Consistency, Isolation, and Durability.
These features are the key difference between a relational database and a non-relational
database.
Atomicity
Atomicity keeps data accurate. It makes sure all data is compliant with the rules, regulations, and
policies of the business.
It also requires all tasks to succeed, or the transaction will roll back.
Atomicity defines all the elements in a complete database transaction.
Consistency
The state of the database must remain consistent throughout the transaction.
Consistency defines the rules for maintaining data points. This ensures they remain in a correct
state after a transaction.
Relational databases have data consistency because the information is updated across
applications and database copies (also known as ‘instances’). This means multiple instances
always have the same data.
Isolation
With a relational database, each transaction is separate and not dependent on others. This is
made possible by isolation.
Isolation keeps the effect of a transaction invisible until it is committed. This reduces the risk of
Uses and benefits of a relational database
Relational databases are often the backbone of a customer relationship management (CRM) system —
such as Salesforce.
But tracking customer transactions is just one use case for a relational database. There are many
others. We even use some in everyday life. For example, when you withdraw money from an ATM, your
bank balance may instantly update on your mobile app if it’s using a relational database. This is
because this scenario’s data point (“Account Balance”) is consistently updated across all platforms.
There are multiple benefits of using a relational database over a non-relationship database. And many
of these affect other systems, including Salesforce.
Data consistency
As mentioned when we outlined ACID, a core part of a relational database is consistency.
A relational database model ensures that all users always see the same data.
This improves understanding across a business because everyone sees the same information. This
ensures that nobody makes business decisions based on out-of-date information.
Data working together
All the data in a relational database has a ‘relationship’ with other data. Columns are built in a way that
makes it easy to establish relationships among data points.
Data working together gives a more holistic view of all your data — including your customers.
Data flexibility
Relational databases allow for flexibility. Users can change what they see. And it’s easy to add
additional data at a later time.
A relational database also allows for a subset of data to be viewed. This means you can hide certain
data if some users only need access to a specific set of columns or rows.
Codd's Rules
Every database has tables, and constraints cannot be referred to as a rational database system. And if
any database has only relational data model, it cannot be a Relational Database System (RDBMS)
. So, some rules define a database to be the correct RDBMS. These rules were developed by Dr.
Edgar F. Codd (E.F. Codd) in 1985, who has vast research knowledge on the Relational Model of
database Systems. Codd presents his 13 rules for a database to test the concept of DBMS against his
relational model, and if a database follows the rule, it is called a true relational database
(RDBMS). These 13 rules are popular in RDBMS, known as Codd's 12 rules.
Rule 0: The Foundation Rule
The database must be in relational form. So that the system can handle the database through its relational
capabilities.
A database contains various information, and this information must be stored in each cell of a table in the form of
rows and columns.
Every single or precise data (atomic value) may be accessed logically from a relational database using the
combination of primary key value, table name, and column name.
This rule defines the systematic treatment of Null values in database records. The null value has various meanings
in the database, like missing the data, no value in a cell, inappropriate information, unknown data and the primary
key should not be null.
It represents the entire logical structure of the descriptive database that must be stored online and is known as a
database dictionary. It authorizes users to access the database and implement a similar query language to access
the database.
All views table can be theoretically updated and must be practically updated by the database systems.
Rule 7: Relational Level Operation (High-Level Insert, Update and delete) Rule
A database system should follow high-level relational operations such as insert, update, and delete in each level or
a single row. It also supports union, intersection and minus operation in the database system.
All stored data in a database or an application must be physically independent to access the database. Each data
should not depend on other data or an application. If data is updated or the physical structure of the database is
changed, it will not show any effect on external applications that are accessing the data from the database.
It is similar to physical data independence. It means, if any changes occurred to the logical level (table structures),
it should not affect the user's view (application). For example, suppose a table either split into two tables, or two
table joins to create a single table, these changes should not be impacted on the user view application.
A database must maintain integrity independence when inserting data into table's cells using the SQL query
language. All entered values should not be changed or rely on any external factor or application to maintain
integrity. It is also helpful in making the database-independent for each front-end application.
Rule 11: Distribution Independence Rule
The distribution independence rule represents a database that must work properly, even if it is stored
in different locations and used by different end-users. Suppose a user accesses the database through
an application; in that case, they should not be aware that another user uses particular data, and the
data they always get is only located on one site. The end users can access the database, and these
access data should be independent for every user to perform the SQL queries.
The non-submersion rule defines RDBMS as a SQL language to store and manipulate the data in the
database. If a system has a low-level or separate language other than SQL to access the database
system, it should not subvert or bypass integrity to transform data.
Database Schema
A database schema is a structure that represents the logical storage of the data in a
database. It represents the organization of data and provides information about the relationships
between the tables in a given database. In this topic, we will understand more about database schema
and its types. Before understanding database schema, lets first understand what a Database is.
What is Database?
A database is a place to store information. It can store the simplest data, such as a list of people as
well as the most complex data. The database stores the information in a well-structured format.
A database schema is the logical representation of a database, which shows how the data is stored
logically in the entire database. It contains list of attributes and instruction that informs the
database engine that how the data is organized and how the elements are related to each other.
A database schema contains schema objects that may include tables, fields, packages, views,
relationships, primary key, foreign key,
In actual, the data is physically stored in files that may be in unstructured form, but to retrieve it
and use it, we need to put it in a structured form. To do this, a database schema is used. It provides
knowledge about how the data is organized in a database and how it is associated with other data.
The schema does not physically contain the data itself; instead, it gives information
about the shape of data and how it can be related to other tables or models.
A database schema object includes the following:
Consistent formatting for all data entries.
Database objects and unique keys for all data entries.
Tables with multiple columns, and each column contains its name and datatype.
The complexity & the size of the schema vary as per the size of the project. It helps developers to
easily manage and structure the database before coding it.
The given diagram is an example of a database schema. It contains three tables, their data types.
This also represents the relationships between the tables and primary keys as well as foreign keys.
1.Logical Schema
2.Physical Schema
3.View Schema
1. Physical Database Schema
A physical database schema specifies how the data is stored physically on a storage system or disk
storage in the form of Files and Indices. Designing a database at the physical level is called
a physical schema.
2. View Schema
The view level design of a database is known as view schema. This schema generally describes the
end-user interaction with the database systems.
3. Logical Database Schema
The Logical database schema specifies all the logical constraints that need to be applied to the stored
data. It defines the views, integrity constraints, and table. Here the term integrity
constraints define the set of rules that are used by DBMS (Database Management System) to
maintain the quality for insertion & update the data. The logical schema represents how the data is
stored in the form of tables and how the attributes of a table are linked together.
At this level, programmers and administrators work, and the implementation of the data structure is
hidden at this level.
Various tools are used to create a logical database schema, and these tools demonstrate the
relationships between the component of your data; this process is called ER modelling.
The ER modelling stands for entity-relationship modelling, which specifies the relationships between
In the given example, the Ids are given in each circle, and these Ids are primary key & foreign keys.
The primary key is used to uniquely identify the entry in a document or record. The Ids of the upper
three circles are the primary keys.
The Foreign key is used as the primary key for other tables. The FK represent the foreign key in the
diagram. It relates one table to another table.
Relational Algebra
Relational database systems are expected to be equipped with a query language that can assist its
users to query the database instances. There are two kinds of query languages − relational algebra
and relational calculus.
Relational Algebra
Relational algebra is a procedural query language, which takes instances of relations as input and
yields instances of relations as output. It uses operators to perform queries. An operator can be
either unary or binary. They accept relations as their input and yield relations as their output.
Relational algebra is performed recursively on a relation and intermediate results are also considered
relations.
Select
Project
Union
Set different
Cartesian product
Rename
Select Operation (σ)
It selects tuples that satisfy the given predicate from a relation.
Notation − σp(r)
Where σ stands for selection predicate and r stands for relation. p is prepositional logic formula which
may use connectors like and, or, and not. These terms may use relational operators like −
=, ≠, ≥, < , >, ≤.
Output − Selects tuples from books where subject is 'database' and 'price' is 450.
σsubject = "database" and price = "450" or year > "2010" (Books)
Output − Selects tuples from books where subject is 'database' and 'price' is 450 or those books
published after
2010
Project Operation (∏)
It projects column(s) that satisfy a given predicate.
Selects and projects columns named as subject and author from the relation Books.
Relational Calculus
There is an alternate way of formulating queries known as Relational Calculus. Relational calculus is a
non-procedural query language. In the non-procedural query language, the user is concerned with the
details of how to obtain the end results. The relational calculus tells what to do but never explains
how to do. Most commercial relational languages are based on aspects of relational calculus including
SQL-QBE and QUEL.
Many of the calculus expressions involves the use of Quantifiers. There are two types of
quantifiers:
•Universal Quantifiers: The universal quantifier denoted by ∀ is read as for all which means that in
a given set of tuples exactly all tuples satisfy a given condition.
•Existential Quantifiers: The existential quantifier denoted by ∃ is read as for all which means that
in a given set of tuples there is at least one occurrences whose value satisfy a given condition.
Before using the concept of quantifiers in formulas, we need to know the concept of Free and Bound
Variables.
A tuple variable t is bound if it is quantified which means that if it appears in any occurrences a
Types of Relational calculus:
Where
T is the resulting tuples
Example
Output: This query selects the tuples from the AUTHOR relation. It returns a tuple with 'name' from
Author who has written an article on 'database’.
TRC (tuple relation calculus) can be quantified. In TRC, we can use Existential (∃) and Universal
Quantifiers (∀).
Example
Output: This query will yield the same result as the previous one.
2. Domain Relational Calculus (DRC)
The second form of relation is known as Domain relational calculus. In domain relational calculus,
filtering variable uses the domain of attributes. Domain relational calculus uses the same operators as
tuple calculus. It uses logical connectives ∧ (and), ∨ (or) and ┓ (not). It uses Existential (∃) and
Universal Quantifiers (∀) to bind the variable. The QBE or Query by example is a query language
related to domain relational calculus.
Notation: { a1, a2, a3, ..., an | P (a1, a2, a3, ... ,an)}
Where
a1, a2 are attributes
P stands for formula built by inner attributes
Example
Output: This query will yield the article, page, and subject from the relational javatpoint, where the
subject is a database.
Well-Formed Formula(WFF) is an expression consisting of variables(capital letters), parentheses, and connective
symbols. An expression is basically a combination of operands & operators and here operands and operators are
the connective symbols.
1.¬ (Negation)
2.∧ (Conjunction)
3.∨ (Disjunction)
4.⇒ (Rightwards Arrow)
5.⇔ (Left-Right Arrow)
For example– Statements like P, ∼P, Q, ∼Q are themselves Well Formed Formulas.
3. If P & Q are WFFs, then (P∨Q), (P∧Q), (P⇒Q), (P⇔Q), etc. are also WFFs.
Example Of Well Formed Formulas:
WFF Explanation
¬¬P By Rule 1 each Statement by itself is a WFF, ¬P is a WFF, and let ¬P = Q. So ¬Q will also be a WFF.
((P⇒Q)⇒Q) By Rule 3 joining ‘(P⇒Q)’ and ‘Q’ with connective symbol ‘⇒’.
(¬Q ∧ P) By Rule 3 joining ‘¬Q’ and ‘P’ with connective symbol ‘∧’.
((¬P∨Q) ∧ ¬¬Q) By Rule 3 joining ‘(¬P∨Q)’ and ‘¬¬Q’ with connective symbol ‘∧’.
¬((¬P∨Q) ∧ ¬¬Q) By Rule 3 joining ‘(¬P∨Q)’ and ‘¬¬Q’ with connective symbol ‘∧’ and then using Rule 2.
ER Model to Relational Model Mapping
ER Model, when conceptualized into diagrams, gives a good overview of entity-relationship, which is
easier to understand. ER diagrams can be mapped to relational schema, that is, it is possible to create
relational schema using ER diagram. We cannot import all the ER constraints into relational model, but
an approximate schema can be generated.
There are several processes and algorithms available to convert ER Diagrams into Relational Schema.
Some of them are automated and some of them are manual. We may focus here on the mapping
diagram contents to relational basics.
Mapping Entity
Mapping Relationship
A weak entity set is one which does not have any primary key associated with it.
Mapping Process
Mapping Process