Unit No.2 Data Modelling & Relational Database Design
Unit No.2 Data Modelling & Relational Database Design
2.1 Data Modelling using ER Diagram: Representation of Entities, Attributes, Relationships and their Type, Cardinality,
Generalization, Specialization, Aggregation.
2.2 Relational data model: Structure of Relational Database Model, Types of keys, Referential Integrity Constraints
2.5 Normalization – Normal forms based on primary (1 NF, 2 NF, 3NF, BCNF) Note: Case studies based on E-R diagram &
Normalization
2.1 Data Modelling using ER Diagram: Representation of Entities, Attributes, Relationships and their Type, Cardinality,
Generalization, Specialization, Aggregation.
ER Diagram:
• Entity Relational model is a model for identifying entities to be represented in the database and representation of how
those entities are related.
• The ER data model specifies enterprise schema that represents the overall logical structure of a database graphically.
• E-R diagrams are used to model real-world objects like a person, a car, a company and the relation between these real-
world objects.
Entity, Entity Type, Entity Set –
An Entity may be an object with a physical existence – a particular person, car, house, or employee – or it may be an object
with a conceptual existence – a company, a job, or a university course.
An Entity is an object of Entity Type and a set of all entities is called as an entity set. e.g.; E1 is an entity having Entity Type
Student and set of all students is called Entity Set. In ER diagram, Entity Type is represented as:
Attribute(s):
Attributes are the properties that define the entity type. For example, Roll_No, Name, DOB, Age, Address, Mobile_No
are the attributes that define entity type Student. In ER diagram, the attribute is represented by an oval.
1. Key Attribute –
The attribute which uniquely identifies each entity in the entity set is called key attribute. For example, Roll No will be
unique for each student. In ER diagram, key attribute is represented by an oval with underlying lines.
2. Composite Attribute –
An attribute composed of many other attribute is called as composite attribute. For example, Address attribute of student
Entity type consists of Street, City, State, and Country. In ER diagram, composite attribute is represented by an oval
comprising of ovals.
3. Multivalued Attribute –
An attribute consisting more than one value for a given entity. For example, Phone_No (can be more than one for a given
student). In ER diagram, a multivalued attribute is represented by a double oval.
4. Derived Attribute –
An attribute that can be derived from other attributes of the entity type is known as a derived attribute. e.g.; Age (can be
derived from DOB). In ER diagram, the derived attribute is represented by a dashed oval.
The complete entity type Student with its attributes can be represented as:
Relationships and their Type :-
A relationship type represents the association between entity types. For example,‘Enrolled in’ is a relationship
type that exists between entity type Student and Course. In ER diagram, the relationship type is represented by
a diamond and connecting the entities with lines.
A set of relationships of the same type is known as a relationship set. The following relationship set depicts S1 as enrolled
in C2, S2 is enrolled in C1, and S3 is enrolled in C3.
Degree of a relationship set:
The number of different entity sets participating in a relationship set is called as the degree of a relationship set.
1. Unary Relationship –
When there is only ONE entity set participating in a relation, the relationship is called a unary relationship. For example,
one person is married to only one person.
2. Binary Relationship –
When there are TWO entities set participating in a relationship, the relationship is called a binary relationship. For
example, a Student is enrolled in a Course.
3. n-ary Relationship –
When there are n entities set participating in a relation, the relationship is called an an n-ary relationship.
Cardinality:
The number of times an entity of an entity set participates in a relationship set is known as cardinality. Cardinality can be
of different types:
1. One-to-one – When each entity in each entity set can take part only once in the relationship, the cardinality is one-to-
one. Let us assume that a male can marry one female and a female can marry one male. So the relationship will be one-to-
one.
Using Sets, it can be represented as:
2. Many to one – When entities in one entity set can take part only once in the relationship set and entities in other entity
sets can take part more than once in the relationship set, cardinality is many to one. Let us assume that a student can take
only one course but one course can be taken by many students. So the cardinality will be n(many) to 1. It means that for
one course there can be n students but for one student, there will be only one course.
In this example, student S1 is enrolled in C1 and C3 and Course C3 is enrolled by S1, S3, and S4. So it is many-to-many
relationships.
In this, there is one-to-many mapping as well where each entity can be related to more than one relationship .
Participation Constraint:
Participation Constraint is applied to the entity participating in the relationship set.
1. Total Participation – Each entity in the entity set must participate in the relationship. If each student must enroll in a
course, the participation of students will be total. Total participation is shown by a double line in the ER diagram.
2. Partial Participation – The entity in the entity set may or may NOT participate in the relationship. If some courses are
not enrolled by any of the students, the participation of the course will be partial.
The diagram depicts the ‘Enrolled in’ relationship set with Student Entity set having total participation and Course Entity set
having partial participation.
Using set, it can be represented as,
Every student in the Student Entity set is participating in a relationship but there exists a course C4 that is not taking
part in the relationship.
Generalization :-
• Generalization is like a bottom-up approach in which two or more entities of lower level combine to form a higher level
entity if they have some attributes in common.
• In generalization, an entity of a higher level can also combine with the entities of the lower level to form a further higher
level entity.
• Generalization is more like subclass and superclass system, but the only difference is the approach. Generalization uses
the bottom-up approach.
• In generalization, entities are combined to form a more generalized entity, i.e., subclasses are combined to make a
superclass.
For example, Faculty and Student entities can be generalized and create a higher level entity Person.
Generalization
Specialization :-
• Specialization is a top-down approach, and it is opposite to Generalization. In specialization, one higher level entity can be
broken down into two lower level entities.
• Specialization is used to identify the subset of an entity set that shares some distinguishing characteristics.
• Normally, the superclass is defined first, the subclass and its related attributes are defined next, and relationship set are
then added.
For example: In an Employee management system, EMPLOYEE entity can be specialized as TESTER or DEVELOPER based on
what role they play in the company.
Specialization
Aggregation
In aggregation, the relation between two entities is treated as a single entity. In aggregation, relationship with its
corresponding entities is aggregated into a higher level entity.
For example: Center entity offers the Course entity act as a single entity in the relationship which is in a relationship with
another entity visitor. In the real world, if a visitor visits a coaching center then he will never enquiry about the Course only or
just about the Center instead he will ask the enquiry about both.
2.2 Relational data model: Structure of Relational Database Model, Types of keys, Referential Integrity
Constraints
STUDENT
Relation Instance: The set of tuples of a relation at a particular instance of time is called a relation instance. Table 1 shows
the relation instance of STUDENT at a particular time. It can change whenever there is an insertion, deletion, or update in
the database.
Degree: The number of attributes in the relation is known as the degree of the relation. The STUDENT relation defined
above has degree 5.
•Cardinality: The number of tuples in a relation is known as cardinality. The STUDENT relation defined above has cardinality
4.
•Column: The column represents the set of values for a particular attribute. The column ROLL_NO is extracted from the
relation STUDENT.
ROLL_NO
•NULL Values: The value which is not known or unavailable is called a NULL value. It is represented by blank space. e.g.;
PHONE of STUDENT having ROLL_NO 4 is NULL.
Types of keys in Relational Model :-
•Keys play an important role in the relational database.
•It is used to uniquely identify any record or row of data from the table. It is also used to establish and identify relationships
between tables.
For example, ID is used as a key in the Student table because it is unique for each student. In the PERSON table,
passport_number, license_number, SSN are keys since they are unique for each person.
1. Primary key
•It is the first key used to identify one and only one instance of an entity uniquely. An entity can contain multiple keys, as
we saw in the PERSON table. The key which is most suitable from those lists becomes a primary key.
•In the EMPLOYEE table, ID can be the primary key since it is unique for each employee. In the EMPLOYEE table, we can
even select License_Number and Passport_Number as primary keys since they are also unique.
•For each entity, the primary key selection is based on requirements and developers.
2. Candidate key
•A candidate key is an attribute or set of attributes that can uniquely identify a tuple.
•Except for the primary key, the remaining attributes are considered a candidate key. The candidate keys are as strong as the
primary key.
For example: In the EMPLOYEE table, id is best suited for the primary key. The rest of the attributes, like SSN,
Passport_Number, License_Number, etc., are considered a candidate key.
3. Super Key
Super key is an attribute set that can uniquely identify a tuple. A super key is a superset of a candidate key.
For example: In the above EMPLOYEE table, for(EMPLOEE_ID, EMPLOYEE_NAME), the name of two employees can be the
same, but their EMPLYEE_ID can't be the same. Hence, this combination can also be a key.
Primary Key
Candidate Key
Alternate Key
6. Composite key
Whenever a primary key consists of more than one attribute, it is known as a composite key. This key is also known as
Concatenated Key.
Composite Key
For example, in employee relations, we assume that an employee may be assigned multiple roles, and an employee may
work on multiple projects simultaneously. So the primary key will be composed of all three attributes, namely Emp_ID,
Emp_role, and Proj_ID in combination. So these attributes act as a composite key since the primary key comprises more
than one attribute.
7. Artificial key
The key created using arbitrarily assigned data are known as artificial keys. These keys are created when a primary key is
large and complex and has no relationship with many other relations. The data values of the artificial keys are usually
numbered in a serial order.
For example, the primary key, which is composed of Emp_ID, Emp_role, and Proj_ID, is large in employee relations. So it
would be better to add a new virtual attribute to identify each tuple in the relation uniquely.
Artificial Key
Referencial Integrity Constraints -
A referential integrity constraint is also known as foreign key constraint. A foreign key is a key whose values are derived
from the Primary key of another table.
The table from which the values are derived is known as Master or Referenced Table and the Table in which values are
inserted accordingly is known as Child or Referencing Table, In other words, we can say that the table containing the
foreign key is called the child table, and the table containing the Primary key/candidate key is called the referenced or
parent table. When we talk about the database relational model, the candidate key can be defined as a set of attribute
which can have zero or more attributes.
Here column Roll is acting as Primary Key, which will help in deriving the value of foreign key in the child table.
The syntax of Child Table or Referencing table is:
1.CREATE TABLE Subject (Roll int references Student, SubCode int, SubName varchar(10) );
In the above table, column Roll is acting as Foreign Key, whose values are derived using the Roll value of Primary key from Master
table.
Foreign Key Constraint OR Referential Integrity constraint-
There are two referential integrity constraint:
Insert Constraint: Value cannot be inserted in CHILD Table if the value is not lying in MASTER Table
Delete Constraint: Value cannot be deleted from MASTER Table if the value is lying in CHILD Table
Suppose you wanted to insert Roll = 05 with other values of columns in SUBJECT Table, then you will immediately see an
error "Foreign key Constraint Violated" i.e. on running an insertion command as:
Insert into SUBJECT values(5, 786, OS); will not be entertained by SQL due to Insertion Constraint ( As you cannot insert
value in a child table if the value is not lying in the master table, since Roll = 5 is not present in the master table, hence it
will not be allowed to enter Roll = 5 in child table )
Similarly, if you want to delete Roll = 4 from STUDENT Table, then you will immediately see an error "Foreign key Constraint
Violated" i.e. on running a deletion command as:
Delete from STUDENT where Roll = 4; will not be entertained by SQL due to Deletion Constraint. ( As you cannot delete
the value from the master table if the value is lying in the child table, since Roll = 5 is present in the child table, hence it will
not be allowed to delete Roll = 5 from the master table, lets if somehow we managed to delete Roll = 5, then Roll = 5 will be
available in child table which will ultimately violate insertion constraint. )
ON DELETE CASCADE.
As per deletion constraint: Value cannot be deleted from the MASTER Table if the value is lying in CHILD Table. The next
question comes can we delete the value from the master table if the value is lying in the child table without violating the
deletion constraint? i.e. The moment we delete the value from the master table the value corresponding to it should also
get deleted from the child table.
The answer to the above question is YES, we can delete the value from the master table if the value is lying in the child table
without violating the deletion constraint, we have to do slight modification while creating the child table, i.e. by adding on
delete cascade.
TABLE SYNTAX
CREATE TABLE Subject (Roll int references Student on delete cascade, SubCode int, SubName varchar(10) );
In the above syntax, just after references keyword( used for creating foreign key), we have added on delete cascade, by
adding such now, we can delete the value from the master table if the value is lying in the child table without violating
deletion constraint. Now if you wanted to delete Roll = 5 from the master table even though Roll = 5 is lying in the child
table, it is possible because the moment you give the command to delete Roll = 5 from the master table, the row having Roll
= 5 from child table will also get deleted.
The above two tables STUDENT and SUBJECT having four values each are shown, now suppose you are looking to delete Roll
= 4 from STUDENT( Master ) Table by writing a SQL command: delete from STUDENT where Roll = 4;
The moment SQL execute the above command the row having Roll = 4 from SUBJECT( Child ) Table will also get deleted, The
resultant STUDENT and SUBJECT table will look like:
From the above two tables STUDENT and SUBJECT, you can see that in table
STUDENT Roll = 4 get deleted while the value of Roll = 4 in the SUBJECT table is
replaced by NULL. This proves that the Foreign key can have null values. If in the
case in SUBJECT Table, column Roll is Primary Key along with Foreign Key then in
that case we could not make a foreign key to have NULL values.
2.3 Codd’s rules
Every database has tables, and constraints cannot be referred to as a rational database system. And if any database has
only relational data model, it cannot be a Relational Database System (RDBMS). So, some rules define a database to be
the correct RDBMS. These rules were developed by Dr. Edgar F. Codd (E.F. Codd) in 1985, who has vast research
knowledge on the Relational Model of database Systems. Codd presents his 13 rules for a database to test the concept
of DBMS against his relational model, and if a database follows the rule, it is called a true relational database (RDBMS).
These 13 rules are popular in RDBMS, known as Codd's 12 rules.
Rule 0: The Foundation Rule
The database must be in relational form. So that the system can handle the database through its relational capabilities.
Rule 7: Relational Level Operation (High-Level Insert, Update and delete) Rule
A database system should follow high-level relational operations such as insert, update, and delete in each level or a single
row. It also supports union, intersection and minus operation in the database system
Rule 8: Physical Data Independence Rule
All stored data in a database or an application must be physically independent to access the database. Each data should not
depend on other data or an application. If data is updated or the physical structure of the database is changed, it will not
show any effect on external applications that are accessing the data from the database.
The distribution independence rule represents a database that must work properly, even if it is stored in different locations
and used by different end-users. Suppose a user accesses the database through an application; in that case, they should not
be aware that another user uses particular data, and the data they always get is only located on one site. The end users can
access the database, and these access data should be independent for every user to perform the SQL queries.
The non-submersion rule defines RDBMS as a SQL language to store and manipulate the data in the database. If a system
has a low-level or separate language other than SQL to access the database system, it should not subvert or bypass integrity
to transform data.
2.4 Database Design using E-R, E-R to Relational :-
An attribute that composed of many other attributes is known as a composite attribute. The composite attribute is
represented by an ellipse, and those ellipses are connected with an ellipse.
c. Multivalued Attribute
An attribute can have more than one value. These attributes are known as a multivalued attribute. The double oval is used to
represent multivalued attribute.
For example, a student can have more than one phone number.
d. Derived Attribute
An attribute that can be derived from other attribute is known as a derived attribute. It can be represented by a dashed
ellipse.
For example, A person's age changes over time and can be derived from another attribute like Date of birth.
3. Relationship
A relationship is used to describe the relation between entities. Diamond or rhombus is used to represent the relationship.
c. Many-to-one relationship
When more than one instance of the entity on the left, and only one instance of an entity on the right associates with the
relationship then it is known as a many-to-one relationship.
For example, Student enrolls for only one course, but a course can have many students.
d. Many-to-many relationship
When more than one instance of the entity on the left, and more than one instance of an entity on the right associates with
the relationship then it is known as a many-to-many relationship.
For example, Employee can assign by many projects and project can have many employees.
ER Model to Relational Model
ER Model, when conceptualized into diagrams, gives a good overview of entity-relationship, which is easier to
understand. ER diagrams can be mapped to relational schema, that is, it is possible to create relational schema using ER
diagram. We cannot import all the ER constraints into relational model, but an approximate schema can be generated.
There are several processes and algorithms available to convert ER Diagrams into Relational Schema. Some of them are
automated and some of them are manual. We may focus here on the mapping diagram contents to relational basics.
ER diagrams mainly comprise of −
•Entity and its attributes
•Relationship, which is association among entities
Mapping Entity
An entity is a real-world object with some attributes.
Normalization
Normalization is the process of organizing the data in the database.
Normalization is used to minimize the redundancy from a relation or set of relations. It is also used to eliminate
undesirable characteristics like Insertion, Update, and Deletion Anomalies.
Normalization divides the larger table into smaller and links them using relationships.
The normal form is used to reduce redundancy from the database table.
Why do we need Normalization?
The main reason for normalizing the relations is removing these anomalies. Failure to eliminate anomalies leads to data
redundancy and can cause data integrity and other problems as the database grows. Normalization consists of a series of
guidelines that helps to guide you in creating a good database structure.
•Deletion Anomaly: The delete anomaly refers to the situation where the deletion of data results in the unintended loss of
some other important data.
Updatation Anomaly: The update anomaly is when an update of a single data value requires multiple rows of data to be
updated.
Types of Normal Forms:
Normalization works through a series of stages called Normal forms. The normal forms apply to individual relations. The
relation is said to be in particular normal form if it satisfies constraints.
Following are the various types of Normal forms:
Normal Form Description
A relation will be in 2NF if it is in 1NF and all non-key attributes are fully functional dependent
2NF
on the primary key.
A relation will be in 4NF if it is in Boyce Codd's normal form and has no multi-valued
4NF
dependency.
A relation is in 5NF. If it is in 4NF and does not contain any join dependency, joining should be
5NF
lossless.
Advantages of Normalization
•Normalization helps to minimize data redundancy.
•Greater overall database organization.
•Data consistency within the database.
•Much more flexible database design.
•Enforces the concept of relational integrity.
Disadvantages of Normalization
•You cannot start building the database before knowing what the user needs.
•The performance degrades when normalizing the relations to higher normal forms, i.e., 4NF, 5NF.
•It is very time-consuming and difficult to normalize relations of a higher degree.