0% found this document useful (0 votes)
63 views65 pages

Database Design Fundamentals Explained

The document provides an overview of advanced database management systems, focusing on database design concepts, including entities, attributes, relationships, and constraints. It discusses the Entity-Relationship Model, normalization forms, and the significance of weak entities and UML class diagrams in database design. Key topics include key constraints, participation constraints, and the representation of entities and relationships in ER diagrams.

Uploaded by

Khuyaish Sharma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
63 views65 pages

Database Design Fundamentals Explained

The document provides an overview of advanced database management systems, focusing on database design concepts, including entities, attributes, relationships, and constraints. It discusses the Entity-Relationship Model, normalization forms, and the significance of weak entities and UML class diagrams in database design. Key topics include key constraints, participation constraints, and the representation of entities and relationships in ER diagrams.

Uploaded by

Khuyaish Sharma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

ADVANCED DATABASE MANAGEMENT SYSTEM

UNIT I – INTRODUCTION TO DATABASE DESIGN

• Entities, Attributes, Entity Sets, Relationships


• Key Constraints, Participation Constraints
• Weak Entities
• UML Class Diagrams: Subclasses, Superclasses, Inheritance
• Specialization, Generalization
• Constraints and Characteristics of Specialization and Generalization Hierarchies
• Modeling of UNION Types Using Categories
• Representing Specialization and Generalization in UML Class Diagrams
• Data Abstraction, Knowledge Representation, and Ontology Concepts

UNIT II – DATABASES DESIGN THEORY

• Problems Caused by Redundancy


• Decompositions, Problems Related to Decomposition
• Reasoning About Functional Dependencies (FDs)
• Normal Forms:
o First Normal Form (1NF)
o Second Normal Form (2NF)
o Third Normal Form (3NF)
o Boyce-Codd Normal Form (BCNF)
o Fourth Normal Form (4NF)
• Lossless Join Decomposition
• Dependency Preserving Decomposition
• Schema Refinement in Database Design
• Multi-Valued Dependencies

UNIT I

Topic 1: Entities, Attributes, Entity Sets, Relationships:

Introduction of the ER Model


The Entity-Relationship Model (ER Model) is a conceptual model for designing a databases.
This model represents the logical structure of a database, including entities, their attributes
and relationships between them.
• Entity: An objects that is stored as data such as Student, Course or Company.
• Attribute: Properties that describes an entity such as StudentID, CourseName,
or EmployeeEmail.
• Relationship: A connection between entities such as "a Student enrolls in
a Course".
What is an Entity?
An Entity represents a real-world object, concept or thing about which data is stored in a
database. It act as a building block of a database. Tables in relational database represent these
entities.
Example of entities:
• Real-World Objects: Person, Car, Employee etc.
• Concepts: Course, Event, Reservation etc.
• Things: Product, Document, Device etc.
The entity type defines the structure of an entity, while individual instances of that type
represent specific entities.
What is an Entity Set?
An entity refers to an individual object of an entity type, and the collection of all entities of
a particular type is called an entity set. For example, E1 is an entity that belongs to the entity
type "Student," and the group of all students forms the entity set.
In the ER diagram below, the entity type is represented as:

Entity Set

We can represent the entity sets in an ER Diagram but we can't represent individual entities
because an entity is like a row in a table, and an ER diagram shows the structure and
relationships of data, not specific data entries (like rows and columns). An ER diagram is a
visual representation of the data model, not the actual data itself.
Types of Entity
There are two main types of entities:
1. Strong Entity
A Strong Entity is a type of entity that has a key Attribute that can uniquely identify each
instance of the entity. A Strong Entity does not depend on any other Entity in the Schema for
its identification. It has a primary key that ensures its uniqueness and is represented by a
rectangle in an ER diagram.
2. Weak Entity
A Weak Entity cannot be uniquely identified by its own attributes alone. It depends on a
strong entity to be identified. A weak entity is associated with an identifying entity (strong
entity), which helps in its identification. A weak entity are represented by a double rectangle.
The participation of weak entity types is always total. The relationship between the weak
entity type and its identifying strong entity type is called identifying relationship and it is
represented by a double diamond.
Example:
A company may store the information of dependents (Parents, Children, Spouse) of an
Employee. But the dependents can't exist without the employee. So dependent will be a Weak
Entity Type and Employee will be identifying entity type for dependent, which means it is
Strong Entity Type.

Strong Entity and Weak Entity

Attributes in ER Model
Attributes are the properties that define the entity type. For example, for a Student entity
Roll_No, Name, DOB, Age, Address, and Mobile_No are the attributes that define entity
type Student. In ER diagram, the attribute is represented by an oval.

Attribute
Types of Attributes
1. Key Attribute
The attribute which uniquely identifies each entity in the entity set is called the key attribute.
For example, Roll_No will be unique for each student. In ER diagram, the key attribute is
represented by an oval with an underline.

Key Attribute
2. Composite Attribute
An attribute composed of many other attributes is called a composite attribute. For example,
the Address attribute of the student Entity type consists of Street, City, State, and Country.
In ER diagram, the composite attribute is represented by an oval comprising of ovals.

Composite Attribute

3. Multivalued Attribute
An attribute consisting of more than one value for a given entity. For example, Phone_No
(can be more than one for a given student). In ER diagram, a multivalued attribute is
represented by a double oval.

Multivalued Attribute
4. Derived Attribute
An attribute that can be derived from other attributes of the entity type is known as a derived
attribute. e.g.; Age (can be derived from DOB). In ER diagram, the derived attribute is
represented by a dashed oval.

Derived Attribute
The Complete Entity Type Student with its Attributes can be represented as:

Entity and Attributes


Relationship Type and Relationship Set
A Relationship Type represents the association between entity types. For example, ‘Enrolled
in’ is a relationship type that exists between entity type Student and Course. In ER diagram,
the relationship type is represented by a diamond and connecting the entities with lines.

Entity-Relationship Set

A set of relationships of the same type is known as a relationship set. The following
relationship set depicts S1 as enrolled in C2, S2 as enrolled in C1, and S3 as registered in C3.

Relationship Set

Degree of a Relationship Set


The number of different entity sets participating in a relationship set is called the degree of a
relationship set.
1. Unary/Recursive Relationship: When there is only ONE entity set participating in a
relation, the relationship is called a unary relationship. For example, one person is married
to only one person.

Unary Relationship

2. Binary Relationship: When there are TWO entities set participating in a relationship, the
relationship is called a binary relationship. For example, a Student is enrolled in a Course.

Binary Relationship
3. Ternary Relationship: When there are three entity sets participating in a relationship, the
relationship is called a ternary relationship.
4. N-ary Relationship: When there are n entities set participating in a relationship, the
relationship is called an n-ary relationship.
Cardinality in ER Model
The maximum number of times an entity of an entity set participates in a relationship set is
known as cardinality.
Cardinality can be of different types:
1. One-to-One
When each entity in each entity set can take part only once in the relationship, the cardinality
is one-to-one. Let us assume that a male can marry one female and a female can marry one
male. So the relationship will be one-to-one.

One to One Cardinality

Using Sets, it can be represented as:

Set Representation of One-to-One

2. One-to-Many
In one-to-many mapping as well where each entity can be related to more than one entity.
Let us assume that one surgeon department can accommodate many doctors. So the
Cardinality will be 1 to M. It means one department has many Doctors.

one to many cardinality


Using sets, one-to-many cardinality can be represented as:

Set Representation of One-to-Many

3. Many-to-One
When entities in one entity set can take part only once in the relationship set and entities in
other entity sets can take part more than once in the relationship set, cardinality is many to
one.
Let us assume that a student can take only one course but one course can be taken by many
students. So the cardinality will be n to 1. It means that for one course there can be n students
but for one student, there will be only one course.

many to one cardinality

Using Sets, it can be represented as:

Set Representation of Many-to-One

In this case, each student is taking only 1 course but 1 course has been taken by many
students.
4. Many-to-Many
When entities in all entity sets can take part more than once in the relationship cardinality is
many to many. Let us assume that a student can take more than one course and one course
can be taken by many students. So the relationship will be many to many.

many to many cardinality

Using Sets, it can be represented as:

Many-to-Many Set Representation

In this example, student S1 is enrolled in C1 and C3 and Course C3 is enrolled by S1, S3,
and S4. So it is many-to-many relationships.

Topic 2: Key Constraints, Participation Constraints

1. Key Constraints

• Definition:
A key constraint specifies how many entities can participate in a relationship set.
• Main Idea:
o In a relationship, at most one entity is associated with another entity (on a
side).
• Example:
o Student – Enrolled – Course
▪ A student can enroll in many courses → No key constraint.
▪ But if we say each student has exactly one ID card, then → 1:1 key
constraint.
• Types:
o 1:1 (One-to-One) → One entity of A maps to one entity of B.
Example: Person ↔ Passport.
o 1:N (One-to-Many) → One entity of A maps to many entities of B.
Example: Department ↔ Employees.
o M:N (Many-to-Many) → Many entities of A map to many entities of B.
Example: Student ↔ Course.
• ER Diagram Notation:
o Arrow (→) from entity to relationship shows key constraint.

2. What is Participation Constraints?


Participation Constraints in database management refer to rules that determine the minimum
and maximum participation of entities or relationships in a given relationship set. While
partial participation permits discretionary involvement, total participation requires every
entity in one set to take part in a relationship in another set. By maintaining consistency and
enforcing business standards, these restrictions guarantee data integrity.
For example, in a College Database, partial participation would permit courses with no
enrolled students, while entire participation might require all students to be enrolled in at
least one course. For database schemas to effectively replicate real-world circumstances and
enable efficient data management, it is imperative to comprehend and include participation
limitations.
There are two types of participation constraints in database management systems, such as :
• Total Participation
• Partial Participation
Total Participation
Entire participation, sometimes known as required participation, denotes the requirement that
each individual in a group participate in an activity pertaining to another group. It's similar
to saying that in order to belong to one group, you must somehow be associated with another.
In a university database, for instance, total participation between courses and students
indicates that each student is required to be registered in a minimum of one course. It follows
that no student can be excluded from a course. It serves as a means of guaranteeing that every
member of one group is connected to something within another, ensuring that nothing is
overlooked or left disconnected.
In below digram,The Participation of an entity set E in a relationship set R is said to be total
if every entity in E participates in at least one relationship in R.
The participation of entity set A in the relationship set is total because every entity and The
participation of entity set B in the relationship set is also total because every entity of B also
participates in the relationship set.

Total Participation
Partial Participation
In database design, partial participation—also known as optional participation—allows
certain aspects of a relationship to be optional. It implies that the way the database is
configured does not require that every entity be linked to every other thing. Consider a
database at a university, for instance. Partial participation can mean that some students are
enrolled in classes but not all students are registered in them. Because it recognizes that not
everything in real life is always connected to everything else, this flexibility is crucial. While
some objects are connected to one another, others may stand alone. It permits scenarios in
which certain database entities may not be connected to any other entity.
In below diagram, The participation of an entity set E in relationship set R is said to be partial
if only some entities in E participate in relationships in R.
The participation of entity set A in the relationship set is partial because only some entities
of A participate in the relationship set. while the participation of entity set B in the
relationship set is total because every entity of B participates in the relationship set.

Partial Participation

Example:
Suppose an entity set Student related to an entity set Course through Enrolled relationship
set.
The participation of entity set course in enrolled relationship set is partial because a course
may or may not have students enrolled in. It is possible that only some of the course entities
are related to the student entity set through the enrolled relationship set.
The participation of entity set student in enrolled relationship set is total because every
student is expect to relate at least one course through the enrolled relationship set.
Conclusion
In conclusion, participation constraints, including total participation and partial participation
are useto design of a robust and efficient database schema. While partial participation permits
optional involvement, whole participation requires each entity in one set to take part in a
relationship in another set. These limitations protect the integrity of the data, uphold
corporate policies, and faithfully replicate actual events in the database. Database designers
can avoid errors and inconsistencies by specifying the minimum and maximum
participation of entities in relationships, which results in a dependable and efficient database
management system.

TOPIC 3: Weak Entities

A weak entity in a database management system (DBMS) is an entity type that


cannot be uniquely identified by its own attributes alone. Unlike strong entities,
which have a primary key, weak entities rely on a strong (owner) entity for their
identification and existence. They are characterized by their dependence on a strong
entity and the use of a partial key (also called a discriminator) to distinguish
between instances within the context of their relationship.
Key Characteristics of Weak Entities
• No Primary Key: Weak entities lack a primary key of their own and cannot
exist independently.
• Dependence on Strong Entity: They rely on a strong entity for their
identification, forming a parent-child relationship.
• Partial Key: A weak entity has a partial key, which, when combined with the
primary key of the strong entity, uniquely identifies its instances.
• Total Participation: Weak entities always participate fully in their identifying
relationship with the strong entity.
Representation in ER Diagrams
• Weak entities are represented by double rectangles.
• The identifying relationship between a weak and strong entity is shown using
a double diamond.
• Partial keys are underlined with a dashed line, while the primary key of the
strong entity is underlined with a solid line.
Examples of Weak Entities
Example-1:
In the below ER Diagram, ‘Payment’ is the weak entity. ‘Loan Payment’ is the
identifying relationship and ‘Payment Number’ is the partial key. Primary Key of
the Loan along with the partial key would be used to identify the records.
Example-2:
The existence of rooms is entirely dependent on the existence of a hotel. So room
can be seen as the weak entity of the hotel.

Example-3:
The bank account of a particular bank has no existence if the bank doesn't exist
anymore.

Example-4:
A company may store the information of dependents (Parents, Children, Spouse)
of an Employee. But the dependents don’t have existence without the employee.
So Dependent will be weak entity type and Employee will be Identifying
Entity type for Dependent.

Importance of Weak Entities


Weak entities are essential for modeling real-world scenarios where certain data
cannot exist independently. They ensure data integrity by linking dependent entities
to their parent entities, preventing orphan records and maintaining consistency.

TOPIC 4: UML Class Diagrams: Subclasses, Superclasses, Inheritance

UML Class Diagram


A UML class diagram visually represents the structure of a system by showing its classes,
attributes, methods, and the relationships between them.
1. Helps everyone involved in a project—like developers and designers—understand
how the system is organized and how its components interact.
2. Helps to communicate and document the structure of the software.
UML Class Notation
Classes are depicted as boxes, each containing three compartments for the class name,
attributes, and methods.

1. Class Name:
• The name of the class is typically written in the top compartment of the
class box and is centered and bold.
2. Attributes:
• Attributes, also known as properties or fields, represent the data
members of the class. They are listed in the second compartment of the
class box and often include the visibility (e.g., public, private) and the
data type of each attribute.
3. Methods:
• Methods, also known as functions or operations, represent the behavior
or functionality of the class. They are listed in the third compartment
of the class box and include the visibility (e.g., public, private), return
type, and parameters of each method.
4. Visibility Notation:
•Visibility notations indicate the access level of attributes and methods.
Common visibility notations include:
o + for public (visible to all classes)
o - for private (visible only within the class)
o # for protected (visible to subclasses)
o ~ for package or default visibility (visible to classes in the
same package)
Parameter Directionality
• In class diagrams, parameter directionality refers to the indication of the flow of
information between classes through method parameters.
• It helps to specify whether a parameter is an input, an output, or both. This
information is crucial for understanding how data is passed between objects
during method calls.

There are three main parameter directionality notations used in class diagrams:
• In (Input):
o An input parameter is a parameter passed from the calling object
(client) to the called object (server) during a method invocation.
o It is represented by an arrow pointing towards the receiving class (the
class that owns the method).
• Out (Output):
o An output parameter is a parameter passed from the called object
(server) back to the calling object (client) after the method execution.
o It is represented by an arrow pointing away from the receiving class.
• InOut (Input and Output):
o An InOut parameter serves as both input and output. It carries
information from the calling object to the called object and vice
versa.
o It is represented by an arrow pointing towards and away from the
receiving class.
Relationships between classes
In class diagrams, relationships between classes describe how classes are connected or
interact with each other within a system. Here are some common types of relationships in
class diagrams:
1. Association
An association represents a bi-directional relationship between two classes. It indicates that
instances of one class are connected to instances of another class. Associations are typically
depicted as a solid line connecting the classes, with optional arrows indicating the direction
of the relationship.
2. Directed Association
A directed association in a UML class diagram represents a relationship between two classes
where the association has a direction, indicating that one class is associated with another in a
specific way.
3. Aggregation
Aggregation is a specialized form of association that represents a "whole-part" relationship. It
denotes a stronger relationship where one class (the whole) contains or is composed of
another class (the part). Aggregation is represented by a diamond shape on the side of the
whole class. In this kind of relationship, the child class can exist independently of its parent
class.
4. Composition
Composition is a stronger form of aggregation, indicating a more significant ownership or
dependency relationship. In composition, the part class cannot exist independently of the
whole class. Composition is represented by a filled diamond shape on the side of the whole
class.
5. Generalization(Inheritance)
Inheritance represents an "is-a" relationship between classes, where one class (the subclass or
child) inherits the properties and behaviors of another class (the superclass or parent).
Inheritance is depicted by a solid line with a closed, hollow arrowhead pointing from the
subclass to the superclass.
6. Realization (Interface Implementation)
Realization indicates that a class implements the features of an interface. It is often used in
cases where a class realizes the operations defined by an interface. Realization is depicted by
a dashed line with an open arrowhead pointing from the implementing class to the interface.
7. Dependency Relationship
A dependency exists between two classes when one class relies on another, but the
relationship is not as strong as association or inheritance. It represents a more loosely coupled
connection between classes.
8. Usage(Dependency) Relationship
A usage dependency relationship in a UML class diagram indicates that one class (the client)
utilizes or depends on another class (the supplier) to perform certain tasks or access certain
functionality. The client class relies on the services provided by the supplier class but does
not own or create instances of it.
• In UML class diagrams, usage dependencies are typically represented by a dashed
arrowed line pointing from the client class to the supplier class.
• The arrow indicates the direction of the dependency, showing that the client class
depends on the services provided by the supplier class.

Purpose of Class Diagrams


The main purpose of using class diagrams is:
• This is the only UML that can appropriately depict various aspects of the OOPs
concept.
• Proper design and analysis of applications can be faster and efficient.
• It is the base for deployment and component diagram.
• It incorporates forward and reverse engineering.

Benefits of Class Diagrams


Below are the benefits of class diagrams:
• Class diagrams represent the system's classes, attributes, methods, and
relationships, providing a clear view of its architecture.
• They shows various relationships between classes, such as associations and
inheritance, helping stakeholders understand component connectivity.
• Class diagrams serve as a visual tool for communication among team members
and stakeholders, bridging gaps between technical and non-technical audiences.
• They guide developers in coding by illustrating the design, ensuring consistency
between the design and actual implementation.
• Many development tools allow for code generation from class diagrams, reducing
manual errors and saving time.

Subclasses
A subclass is a class derived from the superclass. It inherits the properties of the superclass and
also contains attributes of its own. An example is:
Car, Truck and Motorcycle are all subclasses of the superclass Vehicle. They all inherit
common attributes from vehicle such as speed, colour etc. while they have different attributes
also i.e Number of wheels in Car is 4 while in Motorcycle is 2.

Superclasses
A superclass is the class from which many subclasses can be created. The subclasses inherit
the characteristics of a superclass. The superclass is also known as the parent class or base
class.
In the above example, Vehicle is the Superclass and its subclasses are Car, Truck and
Motorcycle.

Inheritance
Inheritance is basically the process of basing a class on another class i.e to build a class on a
existing class. The new class contains all the features and functionalities of the old class in
addition to its own.
The class which is newly created is known as the subclass or child class and the original class
is the parent class or the superclass.

TOPIC 5: Specialization, Generalization


What is Generalization?
In EER diagrams, generalization is a bottom-up method used to combine lower-level entities
into a higher-level object. This approach creates a more generic entity, known as a superclass,
by combining entities with similar features. By removing duplication and arranging the data
in a more organized manner, generalization streamlines the data model.
Advantages of Generalization
• Cuts Down on Redundancy: Cuts down on data duplication by combining
related entities into a single entity.
• Simplifies Schema: Combines many things into a single, clearer schema.
• Enhances Data Organization: By cohesively presenting related entities, it
makes better organization possible.
Disadvantages of Generalization
• Loss of Specificity: The generic entity may take center stage over the distinctive
qualities of lower-level entities.
• Complexity of Querying: As data becomes more abstracted, queries may get
more complicated.
Example of Generalization
Consider two entities Student and Patient. These two entities will have some characteristics
of their own. For example, the Student entity will have Roll_No, Name, and Mob_No while
the patient will have PId, Name, and Mob_No characteristics. Now in this example Name
and Mob_No of both Student and Patient can be combined as a Person to form one higher-
level entity and this process is called as Generalization Process.
What is Specialization?
In EER diagrams, specialization is a top-down method where a higher-level entity is split
into two or more lower-level entities according to their distinct qualities. This technique,
which includes splitting a single entity set into subgroups, is often connected to inheritance,
in which attributes from the higher-level entity are passed down to the lower-level entities.
Advantages of Specialization
• Enhances Specificity: By forming specialized subgroups, it is possible to depict
things in more depth.
• Encourages Inheritance: Relationships and characteristics from higher-level
entities are passed down to lower-level entities.
• Enhances Data Integrity: Makes certain that every entity have distinct qualities
relevant to its area of expertise.
Disadvantages of Specialization
• Expands Schema Size: Adding additional entities may lead to an increase in the
schema's complexity and size.
• Can Cause Redundancy: There might be certain characteristics that are
duplicated across specialized entities.
Example of Specialization
Consider an entity Account. This will have some attributes consider them Acc_No and
Balance. Account entity may have some other attributes like Current_Acc and Savings_Acc.
Now Current_Acc may have Acc_No, Balance and Transactions while Savings_Acc may
have Acc_No, Balance and Interest_Rate henceforth we can say that specialized entities
inherits characteristics of higher level entity.
GENERALIZATION SPECIALIZATION

Generalization works in Bottom-Up


Specialization works in top-down approach.
approach.

In Generalization, size of schema gets In Specialization, size of schema gets


reduced. increased.

Generalization is normally applied to group We can apply Specialization to a single


of entities. entity.

Generalization can be defined as a process of Specialization can be defined as process of


creating groupings from various entity sets creating subgrouping within an entity set

In Generalization process, what actually Specialization is reverse of Generalization.


happens is that it takes the union of two or Specialization is a process of taking a subset
more lower-level entity sets to produce a of a higher level entity set to form a lower-
higher-level entity sets. level entity set.

Generalization process starts with the


Specialization process starts from a single
number of entity sets and it creates high-
entity set and it creates a different entity set
level entity with the help of some common
by using some different features.
features.

In Generalization, the difference and


In Specialization, a higher entity is split to
similarities between lower entities are
form lower entities.
ignored to form a higher entity.
TOPIC 6: Constraints & Characteristics of Specialization & Generalization Hierarchies

First, we discuss constraints that apply to a single specialization or a single generalization. For
brevity, our discussion refers only to specialization even though it applies
to both specialization and generalization. Then, we discuss differences between
specialization/generalization lattices (multiple inheritance) and hierarchies (single
inheritance), and elaborate on the differences between the specialization and generalization
processes during conceptual database schema design.

1. Constraints on Specialization and Generalization


▪ In general, we may have several specializations defined on the same entity type (or
superclass), as shown in Figure 8.1. In such a case, entities may belong to subclasses in
each of the specializations. However, a specialization may also consist of
a single subclass only, such as the {MANAGER} specialization in Figure 8.1; in such
a case, we do not use the circle notation.

▪ In some specializations we can determine exactly the entities that will become members
of each subclass by placing a condition on the value of some attribute of the superclass.
Such subclasses are called predicate-defined (or condition-defined) subclasses. For
example, if the EMPLOYEE entity type has an attribute Job_type, as shown in Figure
8.4, we can specify the condition of membership in the SECRETARY subclass by the
condition (Job_type = ‘Secretary’), which we call the defining predicate of the
subclass. This condition is a constraint specifying that exactly those entities of
the EMPLOYEE entity type whose attribute value for Job_type is ‘Secretary’ belong
to the subclass. We display a predicate-defined subclass by writing the predicate
condition next to the line that connects the subclass to the specialization circle.

▪ If all subclasses in a specialization have their membership condition on


the same attribute of the superclass, the specialization itself is called an attribute-
defined specialization, and the attribute is called the defining attribute of the
specialization. In this case, all the entities with the same value for the attribute belong
to the same sub-class. We display an attribute-defined specialization by placing the
defining attribute name next to the arc from the circle to the superclass, as shown in
Figure 8.4.

▪ When we do not have a condition for determining membership in a subclass, the


subclass is called user-defined. Membership in such a subclass is determined by the
database users when they apply the operation to add an entity to the subclass; hence,
membership is specified individually for each entity by the user, not by any condition
that may be evaluated automatically.
▪ Two other constraints may apply to a specialization. The first is the disjointness
(or disjointedness) constraint, which specifies that the subclasses of the
specialization must be disjoint. This means that an entity can be a member of at
most one of the subclasses of the specialization. A specialization that is attribute-
defined implies the disjointness constraint (if the attribute used to define the
membership predicate is single-valued). Figure 8.4 illustrates this case, where the d in
the circle stands for disjoint. The d notation also applies to user-defined subclasses of
a specialization that must be disjoint, as illustrated by the specialization
{HOURLY_EMPLOYEE, SALARIED_EMPLOYEE} in Figure 8.1. If the subclasses
are not constrained to be dis-joint, their sets of entities may be overlapping; that is, the
same (real-world) entity may be a member of more than one subclass of the
specialization. This case, which is the default, is displayed by placing an o in the circle,
as shown in Figure 8.5.

▪ The second constraint on specialization is called


the completeness (or totalness) constraint, which may be total or partial. A total
specialization constraint specifies that every entity in the superclass must be a member
of at least one subclass in the specialization. For example, if every EMPLOYEE must
be either an

▪ HOURLY_EMPLOYEE or a SALARIED_EMPLOYEE, then the


specialization {HOURLY_EMPLOYEE, SALARIED_EMPLOYEE} in Figure 8.1 is
a total specialization of EMPLOYEE. This is shown in EER diagrams by using a double
line to connect the superclass to the circle. A single line is used to display a partial
specialization, which allows an entity not to belong to any of the subclasses. For
example, if some EMPLOYEE entities do not belong to any of the subclasses
{SECRETARY, ENGINEER, TECHNICIAN} in Figures 8.1 and 8.4, then that
specialization is partial.
Notice that the disjointness and completeness constraints are independent. Hence, we have the
following four possible constraints on specialization:

✓ Disjoint, total

✓ Disjoint, partial

✓ Overlapping, total

✓ Overlapping, partial

➢ Of course, the correct constraint is determined from the real-world meaning that applies
to each specialization. In general, a superclass that was identified through
the generalization process usually is total, because the superclass is derived from the
subclasses and hence contains only the entities that are in the subclasses.

➢ Certain insertion and deletion rules apply to specialization (and generalization) as a


consequence of the constraints specified earlier. Some of these rules are as follows:

➢ Deleting an entity from a superclass implies that it is automatically deleted from all the
subclasses to which it belongs.
➢ Inserting an entity in a superclass implies that the entity is mandatorily inserted in
all predicate-defined (or attribute-defined) subclasses for which the entity satisfies the
defining predicate.

➢ Inserting an entity in a superclass of a total specialization implies that the entity is


mandatorily inserted in at least one of the subclasses of the specialization.

➢ The reader is encouraged to make a complete list of rules for insertions and dele-tions
for the various types of specializations.

Characteristics of Specialization & Generalization Hierarchies

1. Inheritance: Subclasses inherit attributes and relationships of the superclass.


2. Attribute Specificity: Subclasses can add their own unique attributes.
o Example: Employee (Name, Salary) → Manager (Department).
3. Constraint Rules: Disjoint/Overlapping and Total/Partial determine membership.
4. Reusability: Common features are stored once in the superclass.
5. Hierarchy: Maintains IS-A relationship between superclass and subclass.
o Example: Manager IS-A Employee.

TOPIC 7: Modeling of UNION Types Using Categories

1. What is a Category (Union Type)?

• A Category (Union Type) is a special kind of subclass that is a subset of the


UNION of multiple superclasses.
• Used when a subclass must represent entities that come from different and unrelated
entity types.
• Normal specialization/generalization has one superclass, but a category can have two
or more superclasses.

2. Notation in EER Diagram

• Superclasses are connected to a circle with ∪ symbol (union).


• Circle connects to subclass (category) with a subset (⊆) symbol.
• Double line = Total Category, Single line = Partial Category.

3. Example: Vehicle Ownership

• Superclasses: PERSON, COMPANY, BANK.


• Requirement: An OWNER of a vehicle may be a person, a company, or a bank.
• Solution:
o Create category OWNER as subclass of the union of PERSON, COMPANY,
BANK.
o OWNER inherits attributes depending on which superclass the entity belongs
to.

ER/EER Conceptual View (textual):

PERSON COMPANY BANK


\ | /
\ | /
\ | /
∪ (Union Circle)

[OWNER]

4. Attribute Inheritance in Categories

• Entities of a category inherit only the attributes of the superclass they belong to.
o Example: If OWNER is a PERSON → inherits Person attributes (Name,
DOB).
o If OWNER is a COMPANY → inherits Company attributes (RegNo,
Address).

5. Total vs Partial Categories

• Total Category → Must include all entities of the union.


o Example: Every PERSON, COMPANY, and BANK must be an OWNER.
• Partial Category → Includes only some entities of the union.
o Example: Only some PERSONS, COMPANIES, or BANKS may be
OWNERS.

6. Difference from Shared Subclass

• Shared Subclass = Subclass of the intersection of multiple superclasses (must


belong to all).
• Category (Union Type) = Subclass of the union of multiple superclasses (must
belong to at least one).

TOPIC 8: Representing Specialization and Generalization in UML Class


Diagrams

Modeling UML Generalization and Specialization

Generalization groups the common properties of multiple classes into a single, generalized
class. This makes models cleaner and easier to understand. For example, imagine we’re
designing a system for managing clients. Both companies and individuals are clients. Instead
of repeating shared attributes, we create a general “Client” class. This class connects to
“Company” and “Person” classes with lines and arrowheads pointing to the generalized
“Client” class.
Example:
• Abstract Generalization: The system must allow creating clients.
• Corresponding Specializations:
1. The system must allow creating companies.
2. The system must allow creating persons.
If “Client” is abstract (displayed in italics), it cannot have direct instances. This means users
can only create “Company” or “Person” objects. However, if “Client” is not abstract, users
can create generic client objects.

Modeling of a UML
Generalization
Generalization Sets and Constraints

UML introduces generalization sets to group subtypes logically. These sets often include
constraints that define relationships between subtypes. Let’s break them down:
• Incomplete: Not all subtypes are listed. For instance, new roles like “Manufacturer”
can be added later.
• Complete: All possible subtypes are covered. No new ones can exist.
• Disjoint: An instance belongs to only one subtype. For example, a contact is either a
“Person” or a “Company.”
• Overlapping: An instance can belong to multiple subtypes. For example, a contact
may be both a “Client” and a “Supplier.”
Example: Let’s expand the client example:
1. Generalization Set: “Contact Type” – {Complete, Disjoint}: “Person” and
“Company.”
2. Generalization Set: “Contact Kind” – {Incomplete, Overlapping}: “Client,”
“Supplier,” and “Interested Party.”
With these constraints, the system ensures accurate categorization. While every contact must
be either a person or a company, they can simultaneously serve as clients, suppliers, or both.

Modeling UML Generalization Sets and Constraints


Business Case: E-commerce CRM

Consider an e-commerce company implementing a CRM. They use UML to model their
contact management system. All contacts share basic attributes like “Name” and “Email.”
Subtypes such as “Person” and “Company” add specialized fields. For example, a
“Company” contact might include “Tax ID,” while a “Person” contact has a “Date of Birth.”
• Constraint Application:
• “Contact Type” (Complete, Disjoint): A contact must be either a person or
a company.
• “Contact Kind” (Incomplete, Overlapping): A contact can be a client, a
supplier, or both.
This approach avoids duplication and ensures consistency. By grouping shared attributes into
a generalized “Contact” class, the company simplifies its database design.
Heuristics for Identifying Generalizations

Identifying generalizations can be tricky. Here are two practical methods:


1. Linguistic Formulations:
• Examples: “A dog is a kind of animal.” “A boss is a special type of
employee.”
2. Uniformity:
• When several classes share attributes or relationships, create a generalized
class. Ensure the name reflects its broader purpose.
Final Thoughts

The concepts of UML generalization and specialization are invaluable for organizing
complex systems. These concepts promote clarity and prevent redundancy. By applying
constraints wisely, you can create models that are both flexible and precise. Whether you’re
building a CRM or another application, UML ensures your design aligns with business goals.
TOPIC 9: Data Abstraction, Knowledge Representation, and Ontology
Concepts

What is Data Abstraction?


Data abstraction involves simplifying complex details and focussing on the essential aspects of
data. The concept is useful in both database modeling and AI-based knowledge systems. At its
core, data abstraction is a deliberate process of identifying shared properties within a "domain of
discourse," while suppressing irrelevant details.

Key elements of data abstraction include −

• Classification and Instantiation − Grouping similar objects into classes for better
management.
• Identification − Creating unique identifiers. It is needed for distinguishing and linking
objects.
• Specialization and Generalization − Refining or unifying concepts for better
representation of data.
• Aggregation and Association − Combining related entities to form higher-level concepts.

Each abstraction method plays a critical role in managing and interpreting complex data
effectively. Let us now understand these four concepts of data abstraction with examples.
Classification and Instantiation: Grouping Similar Objects
Classification organizes entities into groups based on shared attributes. For instance, in
a Company database −

• Job applicants share attributes like Name, Ssn, and Phone.


• Companies have attributes like Company Name (Cname) and Company Address
(Caddress).

Take a look at the following ER diagram

By grouping applicants and companies into separate classes, it becomes easier to describe and
analyze the data. Instantiation, on the other hand, focuses on individual members. Instantiation
refers to the creation of specific instances from these classes, such as a job applicant named "John
Doe" or a company called "TechCorp".

Example − ER diagrams often illustrate this structure. Classification allows class-level properties
like "Company Type," while Instances might include a "Startup" or "Multinational".

Identification: Creating Unique Identifiers


Identification ensures that each entity is uniquely distinguishable, which is critical for linking
and cross-referencing data. It involves creating names for schema constructs to distinguish
objects using attributes. For example −

• A person in a PERSON entity might be identified by their Name, Ssn, and Address.
• The same person could also appear in a STUDENT entity, identified by a Student
ID and Course.

Without clear identifiers, we cannot link or cross-reference related instances across entities.
Database designers and administrators must implement effective identification mechanisms to
maintain consistency.
Specialization and Generalization
Specialization refines a broader class into specific subclasses. Generalization, on the other
hand, unifies subclasses into a broader superclass. These processes help capture hierarchical
relationships. For example −

• A database may define an Employee superclass, with subclasses such


as Manager and Technician.
• Specializing further, Manager might have unique attributes like Department,
while Technician might have attributes like Skill Set.

Such classifications allow databases to handle both shared and unique attributes effectively.

Aggregation and Association: Combining Related Entities


Aggregation and Association are the concepts used to combine related objects into higher-level
entities.

• Aggregation combines related objects into a composite entity.


• Association links independent entities based on their interactions.

Example − In an Employment database, we may want to represent Interviews between


applicants and companies. Aggregation simplifies the process. Take a look at the following
representation −

An Interview can be modeled as a composite of Company, Applicant, and attributes like Date
and Contact Person. Associating Interview with Job Offer must be done carefully to avoid
incorrect assumptions (e.g., assuming every interview results in a job offer).
What is Knowledge Representation?
Building on data abstraction, Knowledge Representation (KR) is about capturing the structure
and relationships within a knowledge domain. It goes beyond data modeling by
supporting reasoning and inference.

Knowledge Representation models use −

• Rules for decision-making.


• Incomplete and temporal knowledge to manage uncertainty.

Unlike traditional databases, KR systems mix schemas with data instances, enabling intelligent
reasoning over the stored information.

Ontologies and the Semantic Web


Ontology is a concept rooted in philosophy that provides a shared vocabulary for describing a
domain. Ontologies are now critical for creating shared understanding in AI and the Semantic
Web.

An ontology defines −

• Concepts − Entities, attributes, and relationships


• Relationships − How those concepts connect or interact

For example, suppose a company is hiring. In this context, an ontology might define terms like
"Applicant", "Interview", and "Job Offer" and their interconnections.

Role in the Semantic Web


Ontologies means data exchange and search across diverse systems. By defining shared
meanings, they allow diverse applications to exchange and interpret data meaningfully.

Example − A semantic job portal might use ontologies to link job requirements with applicant
profiles, despite differences in data structures, even when the data is in different formats and
structures.

Challenges and Opportunities in Data Abstraction and Knowledge


Representation
Data abstraction and knowledge representation provide powerful tools, but they also have some
limitations −

• Efficiency − Representing exceptions or composite entities can be resource-intensive.


• Flexibility − Balancing schema-level definitions with instance-level data requires careful
planning and design.
UNIT II

TOPIC 1: Problems Caused by Redundancy

1. Wastage of Storage Space

• Duplicate values consume extra storage.


• Example: If a student’s address is stored with every course registration, the same
address is repeated many times.

2. Update Anomalies

• If data is stored in multiple places, updating one copy but not the others leads to
inconsistency.
• Example: If a customer’s phone number is updated in one table but not in all
occurrences, database becomes inconsistent.

3. Insertion Anomalies

• Sometimes new data cannot be inserted without duplicating other values.


• Example: If course details are stored with student records, we cannot add a new
course unless a student registers for it.

4. Deletion Anomalies

• Deleting some data may lead to unintended loss of valuable information.


• Example: If the last student enrolled in a course is deleted, the course information also
gets deleted.

5. Data Inconsistency

• Different copies of the same data may not agree.


• Example: One record shows employee salary = ₹50,000, another shows ₹55,000.

6. Poor Data Integrity

• Redundancy makes it harder to enforce integrity constraints (like primary key, foreign
key).
TOPIC 2: Decompositions

What is Decomposition in DBMS?


When we divide a table into multiple tables or divide a relation into multiple relations, then
this process is termed Decomposition in DBMS. We perform decomposition in DBMS when
we want to process a particular data set. It is performed in a database management system
when we need to ensure consistency and remove anomalies and duplicate data present in the
database. When we perform decomposition in DBMS, we must try to ensure that no
information or data is lost.

Decomposition in DBMS
Types of Decomposition
There are two types of Decomposition:
• Lossless Decomposition
• Lossy Decomposition

Types of Decomposition
Lossless Decomposition
The process in which where we can regain the original relation R with the help of joins from
the multiple relations formed after decomposition. This process is termed as lossless
decomposition. It is used to remove the redundant data from the database while retaining the
useful information. The lossless decomposition tries to ensure following things:
• While regaining the original relation, no information should be lost.
• If we perform join operation on the sub-divided relations, we must get the
original relation.
Example:
There is a relation called R(A, B, C)
A B C

55 16 27

48 52 89

Now we decompose this relation into two sub relations R1 and R2


R1(A, B)
A B

55 16

48 52

R2(B, C)
B C

16 27

52 89

After performing the Join operation we get the same original relation
A B C

55 16 27

48 52 89

Lossy Decomposition
As the name suggests, lossy decomposition means when we perform join operation on the
sub-relations it doesn't result to the same relation which was decomposed. After the join
operation, we always found some extraneous tuples. These extra tuples genrates difficulty
for the user to identify the original tuples.
Example:
We have a relation R(A, B, C)
A B C

1 2 1

2 5 3

3 3 3

Now , we decompose it into sub-relations R1 and R2


R1(A, B)
A B

1 2

2 5

3 3

R2(B, C)
B C

2 1

5 3

3 3

Now After performing join operation


A B C

1 2 1

2 5 3

2 3 3

3 5 3

3 3 3
Properties of Decomposition

• Lossless: All the decomposition that we perform in Database management


system should be lossless. All the information should not be lost while
performing the join on the sub-relation to get back the original relation. It helps
to remove the redundant data from the database.
• Dependency Preservation: Dependency Preservation is an important technique
in database management system. It ensures that the functional dependencies
between the entities is maintained while performing decomposition. It helps to
improve the database efficiency, maintain consistency and integrity.
• Lack of Data Redundancy: Data Redundancy is generally termed as duplicate
data or repeated data. This property states that the decomposition performed
should not suffer redundant data. It will help us to get rid of unwanted data and
focus only on the useful data or information.

Problems related to decomposition

1. Loss of Information

• Non-loss decomposition: When a relation is decomposed into two or more smaller


relations, and the original relation can be perfectly reconstructed by taking the natural
join of the decomposed relations, then it is termed as lossless decomposition. If not, it
is termed "lossy decomposition."
• Example: Let's consider a table `R(A, B, C)` with a dependency `A → B`. If you
decompose it into `R1(A, B)` and `R2(B, C)`, it would be lossy because you can't
recreate the original table using natural joins.

Example: Consider a relation R(A,B,C) with the following data:

|A |B |C |
|----|----|----|
|1 |X |P |
|1 |Y |P |
|2 |Z |Q |
Suppose we decompose R into R1(A,B) and R2(A,C).
R1(A, B):

|A |B |
|----|----|
|1 |X |
|1 |Y |
|2 |Z |

R2(A, C):

|A |C |
|----|----|
|1 |P |
|1 |P |
|2 |Q |

Now, if we take the natural join of R1 and R2 on attribute A, we get back the original relation
R. Therefore, this is a lossless decomposition.

2. Loss of Functional Dependency

• Once tables are decomposed, certain functional dependencies might not be preserved,
which can lead to the inability to enforce specific integrity constraints.
• Example: If you have the functional dependency `A → B` in the original table, but in
the decomposed tables, there is no table with both `A` and `B`, this functional
dependency can't be preserved.
Example: Let's consider a relation R with attributes A,B, and C and the following functional
dependencies:
A→B
B→C
Now, suppose we decompose R into two relations:
R1(A,B) with FD A → B
R2(B,C) with FD B → C
In this case, the decomposition is dependency-preserving because all the functional
dependencies of the original relation R can be found in the decomposed relations R1 and R2.
We do not need to join R1 and R2 to enforce or check any of the functional dependencies.
However, if we had a functional dependency in R, say A → C, which cannot be determined
from either R1 or R2 without joining them, then the decomposition would not be dependency-
preserving for that specific FD.

3. Increased Complexity

• Decomposition leads to an increase in the number of tables, which can complicate


queries and maintenance tasks. While tools and ORM (Object-Relational Mapping)
libraries can mitigate this to some extent, it still adds complexity.

4. Redundancy

• Incorrect decomposition might not eliminate redundancy, and in some cases, can even
introduce new redundancies.

5. Performance Overhead

• An increased number of tables, while aiding normalization, can also lead to more
complex SQL queries involving multiple joins, which can introduce performance
overheads.

TOPIC 3: Reasoning About Functional Dependencies (FDs)

A functional dependency occurs when the value of one attribute (or a set of attributes)
uniquely determines the value of another attribute. This relationship is denoted as:

X→Y

Here, X is the determinant, and Y is the dependent attribute. This means that for each
unique value of X, there is precisely one corresponding value of Y.

Example:

Consider a table named Students with the following attributes:


• StudentID
• StudentName
• StudentAge

If each student has a unique StudentID, and this ID determines the student's name, we can
express this functional dependency as:

StudentID → StudentName

This indicates that knowing the StudentID allows us to determine the StudentName.
StudentID StudentName StudentAge

101 Rahul 23

102 Ankit 22

103 Aditya 22

104 Sahil 24

Functional Dependency
How to represent functional dependency in DBMS?
• Functional dependency is expressed in the form of equations. For example, if we
have an employee record with fields "EmployeeID", "FirstName" and
"LastName" we can specify the function as follows:
EmployeeID -> FirstName, LastName
• To represent functional dependency in DBMS has two main features: left (LHS)
and right (RHS) of the arrow (->).
• For example, if we have a table with attributes "X", "Y" and "Z" and the
attribute "X" can determine the value of the attributes "Y" and "Z".
X -> Y, Z
• This symbol indicates that the value in property "X" determines the values in
property "Y" and "Z". So if you know the value of "X", you can also determine
the value of "Y" and "Z".
Types of Functional Dependency in DBMS
The following are some important types of FDs in DBMS:
Trivial Functional Dependency
The dependency of an attribute on a set of attributes is known as trivial functional
dependency if the set of attributes includes that attribute.

Non-trivial Functional Dependency


If a functional dependency X→Y holds true where Y is not a subset of X then this
dependency is called non trivial Functional dependency.

Multivalued Dependency
A multivalued dependency happens when there are at least three attributes (let us say X, Y
and Z), and for a value of X there is a well defined set of values of Y and a well defined set
of values of Z. However, the set of values of Y is independent of set Z and vice versa.

Semi Non Trivial Functional Dependencies


X -> Y is called semi non-trivial when X intersect Y is not NULL.

Transitive Functional Dependency


Transitive functional dependency in DBMS is the relationship between attributes (columns)
of a database table. This occurs when the price of one property determines the price of another
property through an intermediate (third) factor.

Armstrong’s Axioms in Functional Dependency

Reflexivity: If A is a set of attributes and B is a part of A, then the function A -> B is valid.
Augmentation: If the A -> B dependency is valid, adding multiple elements to either side of
the dependency will not affect the dependency.
Transitivity: If the functions X → Y and Y → Z are both valid, then X → Z is also valid
according to the transitivity rule.

Reasoning Steps
Step 1: Closure of a Set of Attributes

To check what a set of attributes can determine.

Notation: X⁺ = { all attributes functionally determined by X }


Example:

R(A, B, C, D)
FDs: A → B, B → C, C → D
Find A⁺

Solution:
A⁺ = {A}
A → B ⇒ {A, B}
B → C ⇒ {A, B, C}
C → D ⇒ {A, B, C, D}

So A⁺ = {A, B, C, D}
A is a key.

Step 2: Testing if an FD Holds

We can test if X → Y holds by checking if Y ⊆ X⁺.

Example:
FDs: A → B, B → C
Check if A → C holds?
A⁺ = {A, B, C}
Since C ⊆ A⁺ → A → C holds.

Step 3: Finding Candidate Keys

A candidate key is a minimal set of attributes that can determine all other attributes.

Example:

R(A, B, C)
FDs: A → B, B → C

Find A⁺ = {A, B, C}
So A is a candidate key.

Step 4: Equivalence of FD Sets

Two FD sets F and G are equivalent if:

• F⁺ = G⁺ (they imply the same FDs)


Step 5: Minimal Cover (Canonical Cover)

A minimal set of FDs equivalent to the original set — used in normalization.

Steps to find minimal cover:

1. Make RHS single attribute.


2. Remove redundant attributes from LHS.
3. Remove redundant dependencies.

Example:

FDs: A → BC, B → C
Step 1: Split → A → B, A → C, B → C
Step 2 & 3: No redundancy
Minimal cover = {A → B, A → C, B → C}

🧮 Practice Problems
Problem 1: Compute Closure
R(A, B, C, D)
FDs: A → B, B → CD
Find A⁺

Solution:
A⁺ = {A}
A → B ⇒ {A, B}
B → CD ⇒ {A, B, C, D}
A⁺ = {A, B, C, D}

Problem 2: Check if FD Holds


FDs: A → B, B → C
Does A → C hold?

A⁺ = {A, B, C}
So A → C holds by transitivity.

Problem 3: Find Candidate Key


R(A, B, C, D)
FDs: A → B, B → C, C → D

A⁺ = {A, B, C, D}
So A is a candidate key.
Problem 4: Find Minimal Cover
FDs: A → BC, B → C, A → B

Step 1: Split RHS


→ A → B, A → C, B → C
No redundancy
Minimal Cover = {A → B, A → C, B → C}

Problem 5: Use Armstrong’s Axioms

Given:

X → Y and Y → Z

By Transitivity,
⇒X→Z

By Augmentation,
⇒ XW → YW

By Union,
⇒ X → YZ

TOPIC 4: Normal Forms:

Normal forms are a set of progressive rules (or design checkpoints) for relational schemas
that reduce redundancy and prevent data anomalies. Each normal form - 1NF, 2NF, 3NF,
BCNF, 4NF, 5NF - is stricter than the previous one: meeting a higher normal form implies
the lower ones are satisfied. Think of them as layers of cleanliness for your tables: the
deeper you go, the fewer redundancy and integrity problems you’ll have.
Benefits of using Normal Forms:
• Reduce duplicate data and wasted storage.
• Prevent insert, update, and delete anomalies.
• Improve data consistency and integrity.
• Make the schema easier to maintain and evolve.
The Diagram below shows the hierarchy of database normal forms. Each inner circle
represents a stricter level of normalization, starting from 1NF (basic structure) to 5NF
(most refined). As you move inward, data redundancy reduces and data integrity improves.
Each level builds upon the previous one to ensure a cleaner and more efficient database
design.
❖ First Normal Form (1NF)

First Normal Form (1NF) ensures that the structure of a database table is organized in a way
that makes it easier to manage and query.
• A relation is in first normal form if every attribute in that relation is single-
valued attribute or it does not contain any composite or multi-valued attribute.
• It is the first and essential step in to reduce redundancy, improve data integrity
and reducing anomalies in relational database design.
A relation (table) is said to be in First Normal Form (1NF) if:
• All the attributes (columns) contain only atomic (indivisible) values.
• Each column contains values of a single type.
• Each record (row) is unique, meaning it can be identified by a primary key.
• There are no repeating groups or arrays in any row.
Rules for First Normal Form (1NF) in DBMS
To follow the First Normal Form (1NF) in a database, these simple rules must be followed:
Every Column Should Have Single Values
Each column in a table must contain only one value in a cell. No cell should hold multiple
values. If a cell contains more than one value, the table does not follow 1NF.
• Example: A table with columns like [Writer 1], [Writer 2], and [Writer 3] for
the same book ID is not in 1NF because it repeats the same type of information
(writers). Instead, all writers should be listed in separate rows.
All Values in a Column Should Be of the Same Type
Each column must store the same type of data. You cannot mix different types of
information in the same column.
• Example: If a column is meant for dates of birth (DOB), you cannot use it to
store names. Each type of information should have its own column.
Every Column Must Have a Unique Name
Each column in the table must have a unique name. This avoids confusion when retrieving,
updating, or adding data.
• Example: If two columns have the same name, the database system may not
know which one to use.
The Order of Data Doesn’t Matter
In 1NF, the order in which data is stored in a table doesn’t affect how the table works. You
can organize the rows in any way without breaking the rules.
Example:
Consider the below COURSES Relation :

In the above table, Courses has a multi-valued attribute, so it is not in 1NF. To make the
table in 1NF we have to remove the multivalued attributes from the table as given below:

1NF
Now the table is in 1NF as there is no multi-valued attribute present in the table.

❖ Second Normal Form (2NF)

Second Normal Form (2NF) is based on the concept of fully functional dependency. It is a
way to organize a database table so that it reduces redundancy and ensures data consistency.
Fully Functional Dependency means a non-key attribute depends on the entire primary key,
not just part of it.
For a table to be in 2NF, it must first meet the following requirements
1. Meet 1NF Requirements: The table must first satisfy First Normal Form (1NF),
meaning:
• All columns contain single, indivisible values.
• No repeating groups of columns.
2. Eliminate Partial Dependencies: A partial dependency occurs when a non-prime
attribute (not part of the candidate key) depends only on a part of a composite primary key,
rather than the entire key.
By ensuring these steps, a table in 2NF is more efficient and less prone to errors during
updates, inserts, and deletes.
What is Partial Dependency?
The FD (functional dependency) A->B happens to be a partial dependency if B is functionally
dependent on A, and also B can be determined by any other proper subset of A.
In other words, if you have a composite key (a primary key made up of more than one
attribute), and an attribute depends on only a subset of that composite key, rather than the
entire key, that is considered a partial dependency.
A partial dependency would occur whenever a non-prime attribute depends functionally on
a part of the given candidate key.
Example:

Staff Branch Relation


In the given relation StaffBranch, we have the functional dependency:
• staffNo, sName → branchNo.
This means that the combination of staffNo and sName determines branchNo.
BranchNo is also functionally dependent on a subset of the composite key, specifically
staffNo. This means that branchNo can be determined by just staffNo.
• staffNo → branchNo
This is a partial dependency because branchNo depends on only a part of the composite key
(staffNo, sName), not the entire key.
Example of Second Normal Form (2NF)
Consider a table storing information about students, courses, and their fees:
• There are many courses having the same course fee. Here, COURSE_FEE
cannot alone decide the value of COURSE_NO or STUD_NO.
• COURSE_FEE together with STUD_NO cannot decide the value of
COURSE_NO.
• COURSE_FEE together with COURSE_NO cannot decide the value of
STUD_NO.
• The candidate key for this table is {STUD_NO, COURSE_NO} because the
combination of these two columns uniquely identifies each row in the table.
• COURSE_FEE is a non-prime attribute because it is not part of the candidate
key {STUD_NO, COURSE_NO}.
• But, COURSE_NO -> COURSE_FEE, i.e., COURSE_FEE is dependent on
COURSE_NO, which is a proper subset of the candidate key.
• Therefore, Non-prime attribute COURSE_FEE is dependent on a proper subset
of the candidate key, which is a partial dependency and so this relation is not in
2NF.
In 2NF, we eliminate such dependencies by breaking the table into two separate
tables:
1. A table that links students and courses.
2. A table that stores course fees.

Now, each table is in 2NF:


• The Course Table ensures that COURSE_FEE depends only on COURSE_NO.
• The Student-Course Table ensures there are no partial dependencies because it
only relates students to courses.
Now, the COURSE_FEE is no longer repeated in every row, and each table is free from
partial dependencies. This makes the database more efficient and easier to maintain.
Limitations of Second Normal Form (2NF)
While Second Normal Form (2NF) addresses partial dependencies and helps reduce
redundancy, it has some limitations:

1. Doesn't Handle Transitive Dependencies: 2NF ensures that non-prime attributes are
fully dependent on the entire primary key, but it doesn't address transitive dependencies. In
a transitive dependency, an attribute depends on another non-key attribute.
For example, if A → B and B → C, then A indirectly determines C. This can lead to further
redundancy and anomalies.

2. Doesn't Ensure Optimization: Although 2NF eliminates partial dependencies, it may


still leave some redundancy in the data, particularly when dealing with larger and more
complex datasets. It doesn’t guarantee the most efficient or optimized structure for a
database.

3. Complexity in Handling Multi-Attribute Keys: When dealing with composite primary


keys (keys made of multiple attributes), ensuring full dependency can still lead to a
complex design. A further step of normalization (Third Normal Form or 3NF) is required to
resolve transitive dependencies and achieve better data organization.

4. Not Sufficient for Some Use Cases: While 2NF is useful for reducing redundancy in
some situations, in real-world applications where data integrity and efficiency are crucial,
additional normalization (like 3NF) might be needed to address more complex
dependencies and optimize data storage and retrieval.
❖ Third Normal Form (3NF)

The Third Normal Form (3NF) builds on the First (1NF) and Second (2NF) Normal Forms.
Achieving 3NF ensures that the database structure is free of transitive dependencies, reducing
the chances of data anomalies. Even though tables in 2NF have reduced redundancy
compared to 1NF, they may still encounter issues like update anomalies.
A relation is in Third Normal Form (3NF) if it satisfies the following two conditions:
1. It is in Second Normal Form (2NF): This means the table has no partial
dependencies (i.e., no non-prime attribute is dependent on a part of a candidate
key).
2. There is no transitive dependency for non-prime attributes: In simpler terms,
no non-key attribute should depend on another non-key attribute. Instead, all
non-key attributes should depend directly on the primary key.
Understanding Transitive Dependency
To fully grasp 3NF, it’s essential to understand transitive dependency. A transitive
dependency occurs when one non-prime attribute depends on another non-prime attribute
rather than depending directly on the primary key. This can create redundancy and
inconsistencies in the database.
For example, if we have the following relationship between attributes:
• A -> B (A determines B)
• B -> C (B determines C)
This means that A indirectly determines C through B, creating a transitive dependency.
3NF eliminates these transitive dependencies to ensure that non-key attributes are directly
dependent only on the primary key.
Conditions for a Table to be in 3NF
A table is in Third Normal Form (3NF) if, for every non-trivial functional dependency
X→Y, at least one of the following holds:
• X is a superkey: This means that the attribute(s) on the left-hand side of the
functional dependency (X) must be a superkey (a key that uniquely identifies a
tuple in the table).
• Y is a prime attribute: This means that every element of the attribute set Y
must be part of a candidate key (i.e., a prime attribute).

Example 1: Third Normal Form (3NF)


Consider the following relation for a Candidate table with the following attributes and
functional dependencies:
1. Functional dependency Set:
The set of functional dependencies is as follows:
• CAND_NO → CAND_NAME
• CAND_NO → CAND_STATE
• CAND_STATE → CAND_COUNTRY
• CAND_NO → CAND_AGE

2. Determining the Candidate Key:


The candidate key for this relation is {CAND_NO}, since CAND_NO uniquely identifies
all other attributes in the table.

3. Identifying Transitive Dependency:


The issue here arises from the transitive
dependency between CAND_NO and CAND_COUNTRY:
• CAND_NO → CAND_STATE
• CAND_STATE → CAND_COUNTRY
This means that CAND_COUNTRY is transitively dependent
on CAND_NO via CAND_STATE, which violates the Third Normal Form (3NF) rule
that states that no non-prime attribute (non-key attribute) should be transitively dependent
on the primary key.

Converting the Relation into 3NF

To remove the transitive dependency and ensure the relation is in 3NF, we decompose the
original CANDIDATE relation into two separate relations:
1. CANDIDATE: This will store information about the candidates, including
their CAND_NO, CAND_NAME, CAND_STATE,
and CAND_AGE:\text{CANDIDATE (CAND_NO, CAND_NAME,
CAND_STATE, CAND_AGE)}

2. STATE_COUNTRY: This relation will store information about the states and
their respective countries:\text{STATE_COUNTRY (CAND_STATE,
CAND_COUNTRY)}

Final Decomposed Relations:


1. CANDIDATE (CAND_NO, CAND_NAME, CAND_STATE, CAND_AGE)

2. STATE_COUNTRY (CAND_STATE, CAND_COUNTRY)

Why This Decomposition Works:


• The CANDIDATE relation now no longer has a transitive
dependency. CAND_STATE no longer determines CAND_COUNTRY within
this relation.

• The STATE_COUNTRY relation handles the CAND_STATE →


CAND_COUNTRY dependency separately, ensuring that all data is now
organized in a way that satisfies 3NF.
Example 2: Relation R(A, B, C, D, E)
Consider the relation R(A, B, C, D, E) with the following functional dependencies:
A → BC
CD → E
B→D
E→A

Step 1: Identify Candidate Keys


A candidate key is a minimal set of attributes that can uniquely identify a tuple (row) in the
relation. In this case, the possible candidate keys for the relation are {A, E, CD, BC}. This
means that any of these sets of attributes can uniquely identify all other attributes in the
relation.

Step 2: Check Functional Dependencies


Let's analyze the given functional dependencies:
1. A → BC: This means that knowing A allows us to determine both B and C.
2. CD → E: Knowing CD allows us to determine E.
3. B → D: Knowing B allows us to determine D.
4. E → A: Knowing E allows us to determine A.
We observe that all attributes on the right-hand side of the functional dependencies
are prime attributes (i.e., they are part of some candidate key). This means no non-prime
attribute is dependent on another non-prime attribute (which would be a transitive
dependency).

Step 3: Check for Transitive Dependencies


In 3NF, a relation must be free of transitive dependencies, where a non-prime attribute
depends on another non-prime attribute indirectly via the primary key.
• Here, A → BC and B → D, so B is a non-prime attribute that determines D,
and A determines B. However, since B is part of a candidate key, this does not
introduce a transitive dependency.
• E → A and A → BC, meaning E determines A, and then A determines B and C.
Again, no transitive dependency is formed because A is part of a candidate key.
Since there are no transitive dependencies, the relation R satisfies the condition of 3NF.
Step 4: Conclusion
Relation R(A, B, C, D, E) is already in Third Normal Form (3NF) because:
• There are no transitive dependencies.
• All non-prime attributes are functionally dependent only on candidate keys.

Why is 3NF Important?


1. Eliminates Redundancy: 3NF helps to remove unnecessary duplication of data by
ensuring that non-prime attributes (attributes not part of any candidate key) depend directly
on the primary key, not on other non-prime attributes.
2. Prevents Anomalies: A table in 3NF is free from common anomalies such as:
• Insertion Anomaly: The inability to insert data without having to insert
unwanted or redundant data.
• Update Anomaly: The need to update multiple rows of data when a change
occurs in one place.
• Deletion Anomaly: The unintended loss of data when a record is deleted.
3. Preserves Functional Dependencies: 3NF ensures that all functional dependencies are
preserved, meaning that the relationships between attributes are maintained.
4. Lossless Decomposition: When decomposing a relation to achieve 3NF, the
decomposition should be lossless, meaning no information is lost in the process of
normalization.

❖ Boyce-Codd Normal Form (BCNF)

Boyce-Codd Normal Form (BCNF) is an advanced version of 3NF used to reduce redundancy
in databases. It ensures that for every functional dependency, the left side must be a superkey.
This helps create cleaner and more consistent database designs, especially when there are
multiple candidate keys.
Rules for BCNF
• Rule 1: The table should be in the 3rd Normal Form.
• Rule 2: X should be a super-key for every functional dependency (FD) X−>Y in
a given relation.
Note: To test whether a relation is in BCNF, we identify all the determinants and make sure
that they are candidate keys.
Key Notes:
1. To verify BCNF, identify all determinants (left side of FDs) and check whether each is a
candidate key.
2. If a relation is in BCNF, it is automatically in 3NF, 2NF, and 1NF as well.
The normal forms become stricter as we move from 1NF → 2NF → 3NF → BCNF:
• 1NF: Each field must hold atomic (indivisible) values.
• 2NF: No partial dependency on a primary key.
• 3NF: No transitive dependency on a primary key.
• BCNF: Every determinant must be a candidate key.
This progression ensures better structure and removes redundancy at each level.
Why Do We Need BCNF?
• 2NF and 3NF may allow anomalies if a functional dependency exists where the
determinant is not a superkey.
• BCNF handles edge cases where 3NF fails to remove all redundancy, especially
in tables with multiple candidate keys.
• Prevents update, insert, and delete anomalies by ensuring every determinant is a
superkey.
• Makes database design more robust and easier to maintain over time.
• Improves data consistency and clarity by removing hidden or indirect
dependencies.
We are going to discuss some basic examples which let you understand the properties of
BCNF. We will discuss multiple examples here.
Example 1
Consider a relation R with attributes (student, teacher, subject).

FD: { (student, Teacher) -> subject, (student, subject) -> Teacher, (Teacher) -> subject}
• Candidate keys are (student, teacher) and (student, subject).
• The above relation is in 3NF (since there is no transitive dependency). A relation
R is in BCNF if for every non-trivial FD X->Y, X must be a key.
• The above relation is not in BCNF, because in the FD (teacher->subject),
teacher is not a key. This relation suffers with anomalies −
• For example, if we delete the student Tahira , we will also lose the information
that [Link] teaches C. This issue occurs because the teacher is a determinant
but not a candidate key.
R is divided into two relations R1(Teacher, Subject) and R2(Student, Teacher).
For more, refer to BCNF in DBMS.
How to Satisfy BCNF?
For satisfying this table in BCNF, we have to decompose it into further tables. Here is the
full procedure through which we transform this table into BCNF. Let us first divide this main
table into two tables Stu_Branch and Stu_Course Table.
Stu_Branch Table
Stu_ID Stu_Branch

101 Computer Science & Engineering

102 Electronics & Communication Engineering

Candidate Key for this table: Stu_ID.


Stu_Course Table
Stu_Course Branch_Number Stu_Course_No

DBMS B_001 201

Computer Networks B_001 202

VLSI Technology B_003 401

Mobile Communication B_003 402

Candidate Key for this table: Stu_Course.


Stu_Enroll Table
Stu_ID Stu_Course_No

101 201

101 202

102 401

102 402
Candidate Key for this table: {Stu_ID, Stu_Course_No}.
After decomposing into further tables, now it is in BCNF, as it is passing the condition of
Super Key, that in functional dependency X−>Y, X is a Super Key.
Example 3
Find the highest normal form of a relation R(A, B, C, D, E) with FD set as:
{ BC->D, AC->BE, B->E }
Explanation:
• Step-1: As we can see, (AC)+ ={A, C, B, E, D} but none of its subsets can
determine all attributes of the relation, So AC will be the candidate key. A or C
can’t be derived from any other attribute of the relation, so there will be only 1
candidate key {AC}.
• Step-2: Prime attributes are those attributes that are part of candidate key {A,
C} in this example and others will be non-prime {B, D, E} in this example.
• Step-3: The relation R is in 1st normal form as a relational DBMS does not
allow multi-valued or composite attributes.
The relation is in 2nd normal form because BC->D is in 2nd normal form (BC is not a proper
subset of candidate key AC) and AC->BE is in 2nd normal form (AC is candidate key) and
B->E is in 2nd normal form (B is not a proper subset of candidate key AC).
The relation is not in 3rd normal form because in BC->D (neither BC is a super key nor D is
a prime attribute) and in B->E (neither B is a super key nor E is a prime attribute) but to
satisfy 3rd normal for, either LHS of an FD should be super key or RHS should be a prime
attribute. So the highest normal form of relation will be the 2nd Normal form.
Note: A prime attribute cannot be transitively dependent on a key in BCNF relation.
Consider these functional dependencies of some relation R
AB ->C
C ->B
AB ->B
From the given functional dependencies, the candidate keys of relation R are AB and AC.
On close observation, we see that B depends transitively on AB through C, making it a
transitive dependency.
• The first and third dependencies are in BCNF as their left sides are candidate
keys.
• The second dependency is not in BCNF, but it's in 3NF since the right side is a
prime attribute.
So, the highest normal form of relation R is 3NF.
Example 3
For example consider relation R(A, B, C)
A -> BC,
B -> A
A and B both are super keys so the above relation is in BCNF.
Note: BCNF decomposition may always not be possible with dependency preserving,
however, it always satisfies the lossless join condition. For example, relation R (V, W, X, Y,
Z), with functional dependencies:
V, W -> X
Y, Z -> X
W -> Y
It would not satisfy dependency preserving BCNF decomposition.

Note: Redundancies are sometimes still present in a BCNF relation as it is not always
possible to eliminate them completely.
❖ Fourth Normal Form (4NF)

As databases grow in complexity, proper normalization becomes important to reduce data


redundancy and maintain data integrity. Fourth Normal Form (4NF) is a higher level of
normalization in relational database design, which deals with multivalued dependencies
(MVDs).

Multivalued Dependency
Multivalued Dependency
A multivalued dependency occurs in a relation when one attribute determines multiple
independent values of another attribute, independent of other attributes. A multivalued
dependency always requires at least three attributes because it consists of at least two
attributes that are dependent on a third.

For a dependency A -> B, if for a single value of A, multiple values of B exist, then the
table may have a multi-valued dependency. The table should have at least 3 attributes and B
and C should be independent for A ->> B multivalued dependency.
Example: A course can have multiple instructors, a course can also have multiple textbook
authors but instructors and authors are independent of each other. This creates two
independent multivalued dependencies:

Course ->-> Instructor


Course ->-> TextBook_Author

If stored in the same table, this creates redundant combinations and data anomalies. A
multivalued dependency is a generalization of a functional dependency, but they are not the
same.

Fourth Normal Form (4NF)


The Fourth Normal Form (4NF) is a level of database normalization where there are no non-
trivial multivalued dependencies other than a candidate key. It builds on the first three normal
forms (1NF, 2NF and 3NF) and the Boyce-Codd Normal Form (BCNF). It states that, in
addition to a database meeting the requirements of BCNF, it must not contain more than one
multivalued dependency. It is an extension of Boyce-Codd Normal Form (BCNF) and
ensures that a relation does not contain multiple independent one-to-many relationships
within a single table.

Properties
A relation R is in 4NF if and only if the following conditions are satisfied:
1. It should be in the Boyce-Codd Normal Form (BCNF).
2. The table should not have any Multi-valued Dependency.
Key Idea: 4NF eliminates redundancy caused by multivalued dependencies by separating
independent one-to-many relationships into different tables.

A table with a multivalued dependency violates the normalization standard of the Fourth
Normal Form (4NF) because it creates unnecessary redundancies and can contribute to
inconsistent data. To bring this up to 4NF, it is necessary to break this information into two
tables.

Example: Consider the database table of a class that has two relations R1 contains student
ID(SID) and student name (SNAME) and R2 contains course id(CID) and course name
(CNAME).

Table R:

Instructor
Course TextBook_Author

Management X Churchill

Management Y Peters

Management Z Peters

Finance A Weston

Finance A Gilbert

Problem:
• Each Course has multiple Instructors.
• Each Course has multiple TextBook_Author.
• But Instructor and TextBook_Author are not related to each other.
• This causes repetition of combinations, violating 4NF.

Solution: To remove the MVDs and bring the relation to Fourth Normal Form, we split the
original table into two separate tables, each handling one multivalued dependency. This
improves data integrity and removes redundancy.
Table R1:
Instructor
Course

Management X

Management Y

Management Z

Finance A

This table shows which instructor teaches which course.

Table R2:
Course TextBook_Author

Management Churchill

Management Peters

Finance Weston

Finance Gilbert

Result: Now, the 4NF is Achieved

Benefits of Decomposition:
1. No repetition of unrelated attribute combinations.
• In a non 4NF table, if two attributes are independently related to a third, their
combinations get repeated unnecessarily.
• This leads to a cartesian product effect, lots of rows just to represent all
combinations.
Example: If a course has 3 instructors and 2 textbook authors, we get 3 × 2 = 6 rows, even
though there's no link between instructors and authors.

After 4NF decomposition: Instructors and authors are stored in separate tables, so:
• Instructors: 3 rows, Authors: 2 [Link] redundant pairings between them.
• Each table contains data with a single multivalued dependency.
• Both tables are now in BCNF and 4NF.
• Ensures cleaner design, efficient storage and no anomalies.
2. Each Table Contains Data with a Single Multivalued Dependency
• Every decomposed table focuses on only one multivalued relationship.
• There is one clear dependency per table (e.g., Course ->-> Instructor OR
Course ->-> Textbook_Author), not both.
Why it's important:
• It simplifies understanding, querying and maintaining the data.
• Each relation represents one fact, reducing logical complexity.
• This aligns with principle of separation of concerns - one table, one purpose.
3. Both Tables Are Now in BCNF and 4NF
After decomposition:
• There are no partial, transitive or multivalued dependencies.
• All attributes are functionally dependent only on the whole key.
Result: The structure now meets
• Boyce-Codd Normal Form (BCNF) as Every determinant is a candidate key.
• Fourth Normal Form (4NF) as No non-trivial MVDs exist.
• Tables are well-structured, normalized and reliable.
4. Ensures Cleaner Design, Efficient Storage and No Anomalies
• Each table is focused and easier to read.
• Developers and DBAs can understand the schema without confusion.
Efficient Storage:
• Redundant rows are eliminated.
• Fewer rows corresponds to Less storage space and so Faster performance.
No Anomalies:
• Insertion anomaly: You can insert a new instructor without needing a textbook.
• Deletion anomaly: Deleting a textbook doesn't remove the instructor.
• Update anomaly: Update happens in one place only and no risk of mismatched
data.
Note: Decomposing tables to eliminate multivalued dependencies isn't just about "following
rules" , but it's about making your data model more logical, efficient and future-proof.

TOPIC 5: Lossless Join Decomposition

Lossless join decomposition is a critical concept in database management systems (DBMS).


It ensures that when a relation is decomposed into smaller sub-relations, the original relation
can be perfectly reconstructed by performing a natural join on the decomposed relations. This
guarantees no loss of information during the decomposition process.
Key Principles of Lossless Join
1. Preservation of Data Integrity: The decomposition ensures that no data is lost,
maintaining the accuracy and consistency of the database.
2. Reconstruction of Original Relation: The natural join of the decomposed
relations must exactly match the original relation, with no extraneous or missing
tuples.
3. Superkey Condition: The common attribute(s) between the decomposed relations
must form a superkey for at least one of the sub-relations.
Example of Lossless Join
Consider a relation Student(Roll_No, S_Name, S_Dept) decomposed into:
• StudentDetails(Roll_No, S_Name)
• Dept(Roll_No, S_Dept)
Performing a natural join ( StudentDetails ⨝ Dept ) reconstructs the original Student relation
without any loss of data. This makes the decomposition lossless.
Applications of Lossless Join
1. Normalization: Lossless join decomposition is essential for achieving higher
normal forms (e.g., 3NF, BCNF) while ensuring data integrity. It helps reduce
redundancy and anomalies in the database.
2. Data Integrity: It is crucial in systems where data accuracy is paramount, such as
financial systems or healthcare databases.
3. Efficient Querying: By decomposing relations into smaller, focused tables, query
performance can be optimized in certain scenarios.
4. Database Maintenance: Smaller, decomposed relations are easier to manage,
update, and modify without affecting the overall schema.
Considerations and Challenges
While lossless join decomposition ensures data integrity, it can introduce complexities:
• Increased Storage Overhead: Additional tables and attributes may be required.
• Complex Queries: Reconstructing the original relation may involve computationally
expensive join operations.
• Dependency Preservation: Ensuring that functional dependencies are preserved
alongside lossless join decomposition can be challenging.
In summary, lossless join decomposition is a foundational technique in DBMS, ensuring that
data remains consistent and intact during normalization. It is widely applied in designing
robust and efficient database systems, particularly in domains where data accuracy and
integrity are critical.

TOPIC 6: Dependency Preserving Decomposition

• Decomposition of a relation in relational model is done to convert it into appropriate


normal form

• A relation R is decomposed into two or more only if the decomposition is


both lossless join and dependency preserving.
Dependency Preserving Decomposition

• If we decompose a relation R into relations R1 and R2, all dependencies of R must be


part of either R1 or R2 or must be derivable from combination of functional
dependencies(FD) of R1 and R2

• Suppose a relation R(A,B,C,D) with FD set {A->BC} is decomposed into R1(ABC)


and R2(AD) which is dependency preserving because FD A->BC is a part of
R1(ABC).

Theory
Consider a schema R(A,B,C,D) and functional dependencies A->B and C->D which
is decomposed into R1(AB) and R2(CD)

This decomposition is dependency preserving decompostion because

• A->B can be ensured in R1(AB)


• C->D can be ensured in R2(CD)

Example
Let a relation R(A,B,C,D) and set a FDs F = { A -> B , A -> C , C -> D} are given.
A relation R is decomposed into –

R1 = (A, B, C) with FDs F1 = {A -> B, A -> C}, and


R2 = (C, D) with FDs F2 = {C -> D}.
F' = F1 ∪ F2 = {A -> B, A -> C, C -> D}
so, F' = F.
And so, F'+ = F+.

TOPIC 7: Schema Refinement in Database Design

Schema refinement is the process of improving a database schema to ensure it is efficient,


consistent, and scalable. It involves organizing data into well-structured tables, reducing
redundancy, and ensuring data integrity through normalization and constraints. This process
is crucial for creating a robust database that supports efficient queries and minimizes
anomalies.

Key Strategies for Schema Refinement

One common approach is normalization, which divides large tables into smaller, related
tables to eliminate redundancy and ensure consistency. This process reduces anomalies such
as update, insertion, and deletion issues. However, it may increase the complexity of queries
due to the need for joins.
Another approach is denormalization, which adds redundant data to improve query
performance by reducing the number of joins. While it simplifies data access and speeds up
queries, it can lead to data inconsistency and increased storage requirements if not managed
carefully.
Vertical partitioning splits a table into smaller tables based on columns, improving query
performance by reducing I/O operations. This approach is useful when queries frequently
access specific columns. However, it can complicate the schema if queries require data from
multiple tables.
Horizontal partitioning divides a table into smaller tables based on rows, enhancing
scalability and query performance by reducing the amount of data scanned. This is
particularly effective for large datasets but may complicate queries that span multiple
partitions.

Constraints for Data Integrity

Schema refinement also involves applying constraints to enforce data integrity. Examples
include:
• Primary Key Constraint: Ensures each record in a table is unique.
• Foreign Key Constraint: Maintains consistency between related tables.
• Unique Constraint: Ensures all values in a column are distinct.
• Not Null Constraint: Prevents null values in specific columns.
• Check Constraint: Enforces specific conditions on column values.
• Default Constraint: Assigns default values to columns when none are provided.

Optimization and Performance

Schema refinement should also focus on performance optimization. This includes choosing
appropriate data types, creating indexes, and partitioning data effectively. These steps help
improve query performance and reduce database overhead.

Conclusion

Effective schema refinement is essential for building a reliable and efficient database. By
normalizing data, applying constraints, and optimizing for performance, you can create a
schema that minimizes redundancy, ensures data integrity, and supports scalable operations.
This process lays the foundation for a robust database that meets both current and future
application needs.

TOPIC 8: Multi-Valued Dependencies (MVDs)


When one attribute in a database depends on another attribute and has many independent
values, it is said to have multivalued dependency (MVD). It supports maintaining data
accuracy and managing intricate data interactions.
Multi Valued Dependency (MVD)
We can say that multivalued dependency exists if the following conditions are met.
Conditions for MVD
Any attribute say a multiple define another attribute b; if any legal relation r(R), for all pairs
of tuples t1 and t2 in r, such that,
t1[a] = t2[a]
Then there exists t3 and t4 in r such that.
t1[a] = t2[a] = t3[a] = t4[a]
t1[b] = t3[b]; t2[b] = t4[b]
t1[c] = t4[c]; t2[c] = t3[c]
Then multivalued (MVD) dependency exists. To check the MVD in given table, we apply
the conditions stated above and we check it with the values in the given table.

Example
Condition-1 for MVD
t1[a] = t2[a] = t3[a] = t4[a]
Finding from table,
t1[a] = t2[a] = t3[a] = t4[a] = Geeks
So, condition 1 is Satisfied.
Condition-2 for MVD
t1[b] = t3[b]
And
t2[b] = t4[b]
Finding from table,
t1[b] = t3[b] = MS
And
t2[b] = t4[b] = Oracle
So, condition 2 is Satisfied.
Condition-3 for MVD
∃c ∈ R-(a ∪ b) where R is the set of attributes in the relational table.
t1[c] = t4[c]
And
t2[c]=t3[c]
Finding from table,
t1[c] = t4[c] = Reading
And
t2[c] = t3[c] = Music
So, condition 3 is Satisfied. All conditions are satisfied, therefore,
a --> --> b
According to table we have got,
name --> --> project
And for,
a --> --> C
We get,
name --> --> hobby
Hence, we know that MVD exists in the above table and it can be stated by,
name --> --> project
name --> --> hobby
Conclusion
• Multivalued Dependency (MVD) is a form of data dependency where two or
more attributes, other than the key attribute, are functionally independent on
each other, but these attributes depends on the key .
• Data errors and redundancies may result from Multivalued Dependency.
• We can normalize the database to 4NF in order to get rid of Mutlivalued
Dependency.

(OR)

Multivalued dependencies (MVDs) are a type of data dependency in relational databases where
one attribute determines multiple independent values of another attribute. However, there are
some misconceptions or incorrect statements about MVDs that need clarification.
1. MVDs Do Not Imply Functional Dependency: It is incorrect to assume that
multivalued dependencies imply functional dependency. While functional dependency
ensures a unique mapping between attributes, MVDs allow multiple independent
values for a single attribute without violating data consistency.
2. MVDs Do Not Always Lead to Data Redundancy: While MVDs can cause redundancy
in certain cases, they do not inherently lead to redundancy unless the database is not
normalized to the Fourth Normal Form (4NF). Proper normalization can eliminate
redundancy caused by MVDs.
3. MVDs Are Not Limited to Two Attributes: It is a misconception that MVDs only
involve two attributes. They can involve multiple attributes, as long as the conditions
for MVD are satisfied.
4. MVDs Are Not Errors in Design: MVDs are not inherently errors or flaws in database
design. They are natural occurrences in certain data relationships and can be
managed effectively through normalization.
5. MVDs Do Not Violate Database Integrity: When properly handled, MVDs do not
compromise data integrity. They are a logical representation of certain attribute
relationships and can be normalized to maintain consistency.
Understanding these points ensures a clearer perspective on MVDs and their role in database
design and normalization.

You might also like