0% found this document useful (0 votes)
27 views

Database System Part I

This document provides an overview of database systems and the evolution of data management approaches. It discusses how databases have developed from manual, file-based systems to modern database management systems (DBMS). A DBMS allows for efficient, secure storage and retrieval of large amounts of shared data over long periods of time. It also describes the benefits of databases, including data sharing, improved accessibility, reduced redundancy, and maintained data quality.

Uploaded by

aabusafrds
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views

Database System Part I

This document provides an overview of database systems and the evolution of data management approaches. It discusses how databases have developed from manual, file-based systems to modern database management systems (DBMS). A DBMS allows for efficient, secure storage and retrieval of large amounts of shared data over long periods of time. It also describes the benefits of databases, including data sharing, improved accessibility, reduced redundancy, and maintained data quality.

Uploaded by

aabusafrds
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 53

Database Systems Lecture Note

Database System
Database systems are designed to manage large data set in an organization. The
data management involves both definition and the manipulation of the data
which ranges from simple representation of the data to considerations of
structures for the storage of information. The data management also consider the
provision of mechanisms for the manipulation of information.

Today, Databases are essential to every business. They are used to maintain
internal records, to present data to customers and clients on the World-Wide-
Web, and to support many other commercial processes. Databases are likewise
found at the core of many modern organizations.

The power of databases comes from a body of knowledge and technology that
has developed over several decades and is embodied in specialized software
called a database management system, or DBMS. A DBMS is a powerful tool for
creating and managing large amounts of data efficiently and allowing it to
persist over long periods of time, safely. These systems are among the most
complex types of software available.

Thus, for our question: What is a database? In essence a database is nothing


more than a collection of shared information that exists over a long period of
time, often many years. In common dialect, the term database refers to a
collection of data that is managed by a DBMS.

Thus the DB course is about:

„ How to organize data


„ Supporting multiple users
„ Efficient and effective data retrieval
„ Secured and reliable storage of data
„ Maintaining consistent data
„ Making information useful for decision making

Data management passes through the different levels of development along with
the development in technology and services. These levels could best be described
by categorizing the levels into three levels of development. Even though there is
an advantage and a problem overcome at each new level, all methods of data
handling are in use to some extent. The major three levels are;

1. Manual Approach
2. Traditional File Based Approach
3. Database Approach

By: Wondwossen Mulugeta, Faculty of Informatics, AAU 1


Database Systems Lecture Note

1. Manual Approach
In the manual approach, data storage and retrieval follows the primitive and
traditional way of information handling where cards and paper are used for the
purpose. The data storage and retrieval will be performed using human labour.

¾ Files for as many event and objects as the organization has are used to
store information.
¾ Each of the files containing various kinds of information is labelled and
stored in one ore more cabinets.
¾ The cabinets could be kept in safe places for security purpose based on the
sensitivity of the information contained in it.
¾ Insertion and retrieval is done by searching first for the right cabinet then
for the right the file then the information.
¾ One could have an indexing system to facilitate access to the data

Limitations of the Manual approach


¾ Prone to error
¾ Difficult to update, retrieve, integrate
¾ You have the data but it is difficult to compile the information
¾ Limited to small size information
¾ Cross referencing is difficult

An alternative approach of data handling is a computerized way of dealing with


the information. The computerized approach could also be either decentralized
or centralized base on where the data resides in the system.

By: Wondwossen Mulugeta, Faculty of Informatics, AAU 2


Database Systems Lecture Note

2. Traditional File Based Approach


After the introduction of Computer for data processing to the business
community, the need to use the device for data storage and processing
increase. There were, and still are, several computer applications with file
based processing used for the purpose of data handling. Even though the
approach evolved over time, the basic structure is still similar if not identical.
¾ File based systems were an early attempt to computerize the manual filing
system.
¾ This approach is the decentralized computerized data handling method.
¾ A collection of application programs perform services for the end-users. In
such systems, every application program that provides service to end
users define and manage its own data
¾ Such systems have number of programs for each of the different
applications in the organization.
¾ Since every application defines and manages its own data, the system is
subjected to serious data duplication problem.
¾ File, in traditional file based approach, is a collection of records which
contains logically related data.

By: Wondwossen Mulugeta, Faculty of Informatics, AAU 3


Database Systems Lecture Note

Limitations of the Traditional File Based approach


As business application become more complex demanding more flexible and
reliable data handling methods, the shortcomings of the file based system
became evident. These shortcomings include, but not limited to:
¾ Separation or Isolation of Data: Available information in one application
may not be known.
¾ Limited data sharing
¾ Lengthy development and maintenance time
¾ Duplication or redundancy of data
¾ Data dependency on the application
¾ Incompatible file formats between different applications and programs
creating inconsistency.
¾ Fixed query processing which is defined during application development
The limitations for the traditional file based data handling approach arise
from two basic reasons.
1. Definition of the data is embedded in the application program which
makes it difficult to modify the database definition easily.
2. No control over the access and manipulation of the data beyond that
imposed by the application programs.
The most significant problem experienced by the traditional file based approach
of data handling is the “update anomalies”. We have three types of update
anomalies;
1. Modification Anomalies: a problem experienced when one ore more data
value is modified on one application program but not on others
containing the same data set.
2. Deletion Anomalies: a problem encountered where one record set is
deleted from one application but remain untouched in other application
programs.
3. Insertion Anomalies: a problem experienced when ever there is new data
item to be recorded, and the recording is not made in all the applications.
And when same data item is inserted at different applications, there could
be errors in encoding which makes the new data item to be considered as
a totally different object.

By: Wondwossen Mulugeta, Faculty of Informatics, AAU 4


Database Systems Lecture Note

3. Database Approach
Following a famous paper written by Ted Codd in 1970, database systems
changed significantly. Codd proposed that database systems should present the
user with a view of data organized as tables called relations. Behind the scenes,
there might be a complex data structure that allowed rapid response to a variety
of queries. But, unlike the user of earlier database systems, the user of a relational
system would not be concerned with the storage structure. Queries could be
expressed in a very high-level language, which greatly increased the efficiency of
database programmers. The database approach emphasizes the integration and
sharing of data throughout the organization.

Thus in Database Approach:


¾ Database is just a computerized record keeping system or a kind of
electronic filing cabinet.
¾ Database is a repository for collection of computerized data files.
¾ Database is a shared collection of logically related data designed to meet the
information needs of an organization. Since it is a shared corporate
resource, the database is integrated with minimum amount of or no
duplication.
¾ Database is a collection of logically related data where these logically
related data comprises entities, attributes, relationships, and business rules
of an organization's information.
¾ In addition to containing data required by an organization, database also
contains a description of the data which called as “Metadata” or “Data
Dictionary” or “Systems Catalogue” or “Data about Data”.
¾ Since a database contains information about the data (metadata), it is called
a self descriptive collection on integrated records.
¾ The purpose of a database is to store information and to allow users to
retrieve and update that information on demand.
¾ Database is deigned once and used simultaneously by many users.
¾ Unlike the traditional file based approach in database approach there is
program data independence. That is the separation of the data definition
from the application. Thus the application is not affected by changes made
in the data structure and file organization.
¾ Each database application will perform the combination of: Creating
database, Reading, Updating and Deleting data.

By: Wondwossen Mulugeta, Faculty of Informatics, AAU 5


Database Systems Lecture Note

Benefits of the database approach


¾ Data can be shared: two or more users can access and use same data instead
of storing data in redundant manner for each user.
¾ Improved accessibility of data: by using structured query languages, the
users can easily access data without programming experience.
¾ Redundancy can be reduced: isolated data is integrated in database to
decrease the redundant data stored at different applications.
¾ Quality data can be maintained: the different integrity constraints in the
database approach will maintain the quality leading to better decision
making
¾ Inconsistency can be avoided: controlled data redundancy will avoid
inconsistency of the data in the database to some extent.
¾ Transaction support can be provided: basic demands of any transaction
support systems are implanted in a full scale DBMS.
¾ Integrity can be maintained: data at different applications will be integrated
together with additional constraints to facilitate shared data resource.
¾ Security majors can be enforced: the shared data can be secured by having
different levels of clearance and other data security mechanisms.
¾ Improved decision support: the database will provide information useful for
decision making.
¾ Standards can be enforced: the different ways of using and dealing with data
by different unite of an organization can be balanced and standardized by
using database approach.
¾ Compactness: since it is an electronic data handling method, the data is
stored compactly (no voluminous papers).
¾ Speed: data storage and retrieval is fast as it will be using the modern fast
computer systems.
¾ Less labour: unlike the other data handling methods, data maintenance will
not demand much resource.
¾ Centralized information control: since relevant data in the organization will
be stored at one repository, it can be controlled and managed at the
central level.

By: Wondwossen Mulugeta, Faculty of Informatics, AAU 6


Database Systems Lecture Note

Limitations and risk of Database Approach


¾ Introduction of new professional and specialized personnel.
¾ Complexity in designing and managing data
¾ Te cost and risk during conversion from the old to the new system
¾ High cost to be incurred to develop and maintain the system
¾ Complex backup and recover services from the users perspective
¾ Reduced performance due to centralization and data independency
¾ High impact on the system when failure occurs to the central system.

By: Wondwossen Mulugeta, Faculty of Informatics, AAU 7


Database Systems Lecture Note

Database Management System (DBMS)


Database Management System (DBMS) is a Software package used for providing
EFFICIENT, CONVENIENT and SAFE MULTI-USER (many people/programs accessing
same database, or even same data, simultaneously) storage of and access to MASSIVE
amounts of PERSISTENT (data outlives programs that operate on it) data. A DBMS also
provides a systematic method for creating, updating, storing, retrieving data in a
database. DBMS also provides the service of controlling data access, enforcing
data integrity, managing concurrency control, and recovery. Having this in
mind, a full scale DBMS should at least have the following services to
provide to the user.

1. Data storage, retrieval and update in the database


2. A user accessible catalogue
3. Transaction support service: ALL or NONE transaction, which
minimize data inconsistency.
4. Concurrency Control Services: access and update on the database by
different users simultaneously should be implemented correctly.
5. Recovery Services: a mechanism for recovering the database after a
failure must be available.
6. Authorization Services (Security): must support the implementation
of access and authorization service to database administrator and
users.
7. Support for Data Communication: should provide the facility to
integrate with data transfer software or data communication
managers.
8. Integrity Services: rules about data and the change that took place on
the data, correctness and consistency of stored data, and quality of
data based on business constraints.
9. Services to promote data independency between the data and the
application
10. Utility services: sets of utility service facilities like
¾ Importing data
¾ Statistical analysis support
¾ Index reorganization
¾ Garbage collection

By: Wondwossen Mulugeta, Faculty of Informatics, AAU 8


Database Systems Lecture Note

DBMS and Components of DBMS Environment


A DBMS is software package used to design, manage, and maintain databases.
Each DBMS should have facilities to define the database, manipulate the
content of the database and control the database. These facilities will help
the designer, the user as well as the database administrator to discharge
their responsibility in designing, using and managing the database. It
provides the following facilities:

¾ Data Definition Language (DDL):


o Language used to define each data element required by the
organization.
o Commands for setting up schema or the intension of database
o These commands are used to setup a database, create, delete and
alter table with the facility of handling constraints

¾ Data Manipulation Language (DML):


o Is a core command used by end-users and programmers to store,
retrieve, and access the data in the database e.g. SQL
o Since the required data or Query by the user will be extracted using
this type of language, it is also called "Query Language"

¾ Data Dictionary:
o Due to the fact that a database is a self describing system, this tool,
Data Dictionary, is used to store and organize information about
the data stored in the database.

¾ Data Control Language:


o Database is a shared resource that demands control of data access
and usage. The database administrator should have the facility to
control the overall operation of the system.
o Data Control Languages are commands that will help the Database
Administrator to control the database.
o The commands include grant or revoke privileges to access the
database or particular object within the database and to store or
remove database transactions

By: Wondwossen Mulugeta, Faculty of Informatics, AAU 9


Database Systems Lecture Note

The DBMS is software package that helps to design, manage, and use data using
the database approach. Taking a DBMS as a system, one can describe it with
respect to it environment or other systems interacting with the DBMS. The DBMS
environment has five components. To design and use a database, there will be
the interaction or integration of Hardware, Software, Data, Procedure and
People.

1. Hardware: are components that one can touch and feel. These
components are comprised of various types of personal computers,
mainframe or any server computers to be used in multi-user system,
network infrastructure, and other peripherals required in the system.

2. Software: are collection of commands and programs used to


manipulate the hardware to perform a function. These include
components like the DBMS software, application programs, operating
systems, network software, language software and other relevant
software.

3. Data: since the goal of any database system is to have better control of
the data and making data useful, Data is the most important component to
the user of the database. There are two categories of data in any database
system: that is Operational and Metadata. Operational data is the data
actually stored in the system to be used by the user. Metadata is the data
that is used to store information about the database itself.
The structure of the data in the database is called the schema, which is
composed of the Entities, Properties of entities, and relationship between
entities.

4. Procedure: this is the rules and regulations on how to design and use a
database. It includes procedures like how to log on to the DBMS, how to
use facilities, how to start and stop transaction, how to make backup, how
to treat hardware and software failure, how to change the structure of the
database.

5. People: this component is composed of the people in the organization


that are responsible or play a role in designing, implementing, managing,
administering and using the resources in the database. This component
includes group of people with high level of knowledge about the database
and the design technology to other with no knowledge of the system
except using the data in the database.

By: Wondwossen Mulugeta, Faculty of Informatics, AAU 10


Database Systems Lecture Note

Database Development Life Cycle

As it is one component in most information system development tasks, there are


several steps in designing a database system. Here more emphasis is given to the
design phases of the system development life cycle. The major steps in database
design are;

1. Planning: that is identifying information gap in an organization and


propose a database solution to solve the problem.

2. Analysis: that concentrates more on fact finding about the problem or


the opportunity. Feasibility analysis, requirement determination and
structuring, and selection of best design method are also performed at this
phase.

3. Design: in database designing more emphasis is given to this phase. The


phase is further divided into three sub-phases.
a. Conceptual Design: concise description of the data, data type,
relationship between data and constraints on the data.
• There is no implementation or physical detail consideration.
• Used to elicit and structure all information requirements
b. Logical Design: a higher level conceptual abstraction with selected
specific data model to implement the data structure.
• It is particular DBMS independent and with no other
physical considerations.
c. Physical Design: physical implementation of the upper level
design of the database with respect to internal storage and file
structure of the database for the selected DBMS.
• To develop all technology and organizational specification.

4. Implementation: the testing and deployment of the designed


database for use.

5. Operation and Support: administering and maintaining the


operation of the database system and providing support to users.

By: Wondwossen Mulugeta, Faculty of Informatics, AAU 11


Database Systems Lecture Note

Roles in Database Design and Use


As people are one of the components in DBMS environment, there are group of
roles played by different stakeholders of the designing and operation of a
database system.

1. DataBase Administrator (DBA)


¾ Responsible to oversee, control and manage the database resources (the
database itself, the DBMS and other related software)
¾ Authorizing access to the database
¾ Coordinating and monitoring the use of the database
¾ Responsible for determining and acquiring hardware and software
resources
¾ Accountable for problems like poor security, poor performance of the
system
¾ Involves in all steps of database development
We can have further classifications of this role in big organizations having
huge amount of data and user requirement.
1. Data Administrator (DA): is responsible on management of data
resources. Involves in database planning, development,
maintenance of standards policies and procedures at the conceptual
and logical design phases.

2. DataBase Administrator (DBA): is more technically oriented role.


Responsible for the physical realization of the database. Involves in
physical design, implementation, security and integrity control of
the database.

2. DataBase Designer (DBD)


¾ Identifies the data to be stored and choose the appropriate structures to
represent and store the data.
¾ Should understand the user requirement and should choose how the user
views the database.
¾ Involve on the design phase before the implementation of the database
system.
We have two distinctions of database designers, one involving in the logical
and conceptual design and another involving in physical design.

By: Wondwossen Mulugeta, Faculty of Informatics, AAU 12


Database Systems Lecture Note

1. Logical and Conceptual DBD


¾ Identifies data (entity, attributes and relationship) relevant
to the organization
¾ Identifies constraints on each data
¾ Understand data and business rules in the organization
¾ Sees the database independent of any data model at
conceptual level and consider one specific data model at
logical design phase.

2. Physical DBD
¾ Take logical design specification as input and decide how it
should be physically realized.
¾ Map the logical data model on the specified DBMS with respect
to tables and integrity constraints. (DBMS dependent designing)
¾ Select specific storage structure and access path to the database
¾ Design security measures required on the database

3. Application Programmer and Systems Analyst


¾ System analyst determines the user requirement and how the user
wants to view the database.
¾ The application programmer implements these specifications as
programs; code, test, debug, document and maintain the application
program.
¾ Determines the interface on how to retrieve, insert, update and delete
data in the database.
¾ The application could use any high level programming language
according to the availability, the facility and the required service.

4. End Users
Workers, whose job requires accessing the database frequently for various
purpose. There are different group of users in this category.
1. Naïve Users:
¾ Sizable proportion of users
¾ Unaware of the DBMS
¾ Only access the database based on their access level and
demand
¾ Use standard and pre-specified types of queries.
2. Sophisticated Users
¾ Are users familiar with the structure of the Database and
facilities of the DBMS.
¾ Have complex requirements
¾ Have higher level queries
¾ Are most of the time engineers, scientists, business analysts, etc

By: Wondwossen Mulugeta, Faculty of Informatics, AAU 13


Database Systems Lecture Note

3. Casual Users
¾ Users who access the database occasionally.
¾ Need different information from the database each time.
¾ Use sophisticated database queries to satisfy their needs.
¾ Are most of the time middle to high level managers.

These users can be again classified as “Actors on the Scene” and “Workers
Behind the Scene”.

Actors On the Scene:


¾ Data Administrator
¾ Database Administrator
¾ Database Designer
¾ End Users

Workers Behind the Scene


¾ DBMS designers and implementers: who design and implement
different DBMS software.
¾ Tool Developers: experts who develop software packages that facilitates
database system designing and use. Prototype, simulation, code
generator developers could be an example. Independent software
vendors could also be categorized in this group.
¾ Operators and Maintenance Personnel: system administrators who are
responsible for actually running and maintaining the hardware and
software of the database system and the information technology
facilities.

By: Wondwossen Mulugeta, Faculty of Informatics, AAU 14


Database Systems Lecture Note

ANSI-SPARC Architecture
The purpose and origin of the Three-Level database
architecture
8 All users should be able to access same data. This is important since
the database is having a shared data feature where all the data is
stored in one location and all users will have their own customized
way of interacting with the data.
8 A user's view is unaffected or immune to changes made in other
views. Since the requirement of one user is independent of the other, a
change made in one user’s view should not affect other users.
8 Users should not need to know physical database storage details. As
there are naïve users of the system, hardware level or physical details
should be a black-box for such users.
8 DBA should be able to change database storage structures without
affecting the users' views. A change in file organization, access method
should not affect the structure of the data which in turn will have no
effect on the users.
8 Internal structure of database should be unaffected by changes to
physical aspects of storage.
8 DBA should be able to change conceptual structure of database
without affecting all users. In any database system, the DBA will have
the privilege to change the structure of the database, like adding tables,
adding and deleting an attribute, changing the specification of the
objects in the database.
All the above and many other functionalities are possible due to the
three level ANSI-SPARC architecture.

Three-level ANSI-SPARC Architecture of a Database

By: Wondwossen Mulugeta, Faculty of Informatics, AAU 15


Database Systems Lecture Note

ANSI-SPARC Architecture and Database Design Phases

External Level: Users' view of the database. Describes that part of database
that is relevant to a particular user. Different users have their own
customized view of the database independent of other users.

Conceptual Level: Community view of the database. Describes what data is


stored in database and relationships among the data.

Internal Level: Physical representation of the database on the computer.


Describes how the data is stored in the database.

By: Wondwossen Mulugeta, Faculty of Informatics, AAU 16


Database Systems Lecture Note

The following example can be taken as an illustration for the difference between
the three levels in the ANSI-SPARC database Architecture. Where:
• The first level is concerned about the group of users and their
respective data requirement independent of the other.
• The second level is describing the whole content of the database
where one piece of information will be represented once.
• The third level

Differences between Three Levels of ANSI-SPARC Architecture

By: Wondwossen Mulugeta, Faculty of Informatics, AAU 17


Database Systems Lecture Note

Defines DBMS schemas at three levels:


Internal schema: at the internal level to describe physical storage structures and
access paths. Typically uses a physical data model.

Conceptual schema: at the conceptual level to describe the structure and


constraints for the whole database for a community of users. Uses a conceptual
or an implementation data model.

External schema: at the external level to describe the various user views. Usually
uses the same data model as the conceptual level.

Data Independence
Logical Data Independence:
8 Refers to immunity of external schemas to changes in conceptual
schema.
8 Conceptual schema changes e.g. addition/removal of entities
should not require changes to external schema or rewrites of
application programs.
8 The capacity to change the conceptual schema without having to
change the external schemas and their application programs.

Physical Data Independence


8 The ability to modify the physical schema without changing the
logical schema
8 Applications depend on the logical schema
8 In general, the interfaces between the various levels and
components should be well defined so that changes in some parts
do not seriously influence others.
8 The capacity to change the internal schema without having to
change the conceptual schema

8 Refers to immunity of conceptual schema to changes in the internal


schema
8 Internal schema changes e.g. using different file organizations,
storage structures/devices should not require change to
conceptual or external schemas.

By: Wondwossen Mulugeta, Faculty of Informatics, AAU 18


Database Systems Lecture Note

Data Independence and the ANSI-SPARC Three-level Architecture

By: Wondwossen Mulugeta, Faculty of Informatics, AAU 19


Database Systems Lecture Note

The distinction between a Data Definition Language (DDL) and a Data


Manipulation Language (DML)

Database Languages
Data Definition Language (DDL)
8 Allows DBA or user to describe and name entitles, attributes and
relationships required for the application.
8 Specification notation for defining the database schema

Data Manipulation Language (DML)


8 Provides basic data manipulation operations on data held in the
database.
8 Language for accessing and manipulating the data organized by
the appropriate data model
8 DML also known as query language

Procedural DML: user specifies what data is required and how to


get the data.

Non-Procedural DML: user specifies what data is required but not


how it is to be retrieved

SQL is the most widely used non-procedural language query


language

Fourth Generation Language (4GL)


8 Query Languages
8 Forms Generators
8 Report Generators
8 Graphics Generators
8 Application Generators

By: Wondwossen Mulugeta, Faculty of Informatics, AAU 20


Database Systems Lecture Note

A Classification of data models


Data Model
A specific DBMS has its own specific Data Definition Language, but this
type of language is too low level to describe the data requirements of an
organization in a way that is readily understandable by a variety of users.
We need a higher-level language.
Such a higher-level is called data-model.

Data Model: a set of concepts to describe the structure of a database,


and certain constraints that the database should obey.

A data model is a description of the way that data is stored in a database.


Data model helps to understand the relationship between entities and to create
the most effective structure to hold data.

Data Model is a collection of tools or concepts for describing


8 Data
8 Data relationships
8 Data semantics
8 Data constraints

The main purpose of Data Model is to represent the data in an


understandable way.
Categories of data models include:
8 Object-based
8 Record-based
8 Physical
Record-based Data Models
Consist of a number of fixed format records.
Each record type defines a fixed number of fields,
Each field is typically of a fixed length.
8 Hierarchical Data Model
8 Network Data Model
8 Relational Data Model

By: Wondwossen Mulugeta, Faculty of Informatics, AAU 21


Database Systems Lecture Note

1. Hierarchical Model
• The simplest data model
• Record type is referred to as node or segment
• The top node is the root node
• Nodes are arranged in a hierarchical structure as sort of upside-
down tree
• A parent node can have more than one child node
• A child node can only have one parent node
• The relationship between parent and child is one-to-many
• Relation is established by creating physical link between stored
records (each is stored with a predefined access path to other
records)
• To add new record type or relationship, the database must be
redefined and then stored in a new form.

Department

Employee Job

Time Card Activity

ADVANTAGES of Hierarchical Data Model:


8 Hierarchical Model is simple to construct and operate on
8 Corresponds to a number of natural hierarchically organized domains
- e.g., assemblies in manufacturing, personnel organization in
companies
8 Language is simple; uses constructs like GET, GET UNIQUE, GET
NEXT, GET NEXT WITHIN PARENT etc.

DISADVANTAGES of Hierarchical Data Model:


8 Navigational and procedural nature of processing
8 Database is visualized as a linear arrangement of records
8 Little scope for "query optimization"

By: Wondwossen Mulugeta, Faculty of Informatics, AAU 22


Database Systems Lecture Note

2. Network Model
• Allows record types to have more that one parent unlike
hierarchical model
• A network data models sees records as set members
• Each set has an owner and one or more members
• Allow no many to many relationship between entities
• Like hierarchical model network model is a collection of physically
linked records.
• Allow member records to have more than one owner

Department Job

Employee
Activity

Time Card

ADVANTAGES of Network Data Model:


8 Network Model is able to model complex relationships and represents
semantics of add/delete on the relationships.
8 Can handle most situations for modeling using record types and
relationship types.
8 Language is navigational; uses constructs like FIND, FIND member,
FIND owner, FIND NEXT within set, GET etc. Programmers can do
optimal navigation through the database.

DISADVANTAGES of Network Data Model:


8 Navigational and procedural nature of processing
8 Database contains a complex array of pointers that thread through a
set of records.
8 Little scope for automated "query optimization”

By: Wondwossen Mulugeta, Faculty of Informatics, AAU 23


Database Systems Lecture Note

3. Relational Data Model


• Developed by Dr. Edgar Frank Codd in 1970 (famous paper, 'A
Relational Model for Large Shared Data Banks')
• Terminologies originates from the branch of mathematics called set
theory and relation
• Can define more flexible and complex relationship
• Viewed as a collection of tables called “Relations” equivalent to
collection of record types
• Relation: Two dimensional table
• Stores information or data in the form of tables Æ rows and columns
• A row of the table is called tupleÆ equivalent to record
• A column of a table is called attributeÆ equivalent to fields
• Data value is the value of the Attribute
• Records are related by the data stored jointly in the fields of records in
two tables or files. The related tables contain information that creates
the relation
• The tables seem to be independent but are related some how.
• No physical consideration of the storage is required by the user
• Many tables are merged together to come up with a new virtual view
of the relationship

Alternative terminologies
Relation Table File
Tuple Row Record
Attribute Column Field

• The rows represent records (collections of information about


separate items)
• The columns represent fields (particular attributes of a record)
• Conducts searches by using data in specified columns of one table to
find additional data in another table
• In conducting searches, a relational database matches information
from a field in one table with information in a corresponding field of
another table to produce a third table that combines requested data
from both tables

By: Wondwossen Mulugeta, Faculty of Informatics, AAU 24


Database Systems Lecture Note

Relational Data Model

Properties of Relational Databases


• Each row of a table is uniquely identified by a PRIMARY KEY
composed of one or more columns
• Each tuple in a relation must be unique
• Group of columns, that uniquely identifies a row in a table is called a
CANDIDATE KEY
• ENTITY INTEGRITY RULE of the model states that no component of
the primary key may contain a NULL value.
• A column or combination of columns that matches the primary key of
another table is called a FOREIGN KEY. Used to cross-reference
tables.
• The REFERENTIAL INTEGRITY RULE of the model states that, for
every foreign key value in a table there must be a corresponding
primary key value in another table in the database or it should be
NULL.
• All tables are LOGICAL ENTITIES
• A table is either a BASE TABLES (Named Relations) or VIEWS
(Unnamed Relations)
• Only Base Tables are physically stores
• VIEWS are derived from BASE TABLES with SQL instructions like:
[SELECT .. FROM .. WHERE .. ORDER BY]
• Is the collection of tables
o Each entity in one table
o Attributes are fields (columns) in table
• Order of rows and columns is immaterial
• Entries with repeating groups are said to be un-normalized
• Entries are single-valued
• Each column (field or attribute) has a distinct name

All values in a column represent the same attribute and have the same
data format

By: Wondwossen Mulugeta, Faculty of Informatics, AAU 25


Database Systems Lecture Note

Building Blocks of the Relational Data Model


The building blocks of the relational data model are:

¾ Entities: real world physical or logical object


¾ Attributes: properties used to describe each Entity or real world object.
¾ Relationship: the association between Entities
¾ Constraints: rules that should be obeyed while manipulating the data.

1. The ENTITIES (persons, places, things etc.) which the organization has to
deal with. Relations can also describe relationships

The name given to an entity should always be a singular noun descriptive


of each item to be stored in it. E.g.: student NOT students.

Every relation has a schema, which describes the columns, or fields the
relation itself corresponds to our familiar notion of a table:
A relation is a collection of tuples, each of which contains values for a
fixed number of attributes
„ Existence Dependency: the dependence of an entity on the existence
of one or more entities.
„ Weak entity : an entity that can not exist without the entity with
which it has a relationship – it is indicated by a double rectangle

2. The ATTRIBUTES - the items of information which characterize and describe


these entities.

Attributes are pieces of information ABOUT entities. The analysis must of


course identify those which are actually relevant to the proposed
application. Attributes will give rise to recorded items of data in the
database

At this level we need to know such things as:

• Attribute name (be explanatory words or phrases)


• The domain from which attribute values are taken (A DOMAIN is
a set of values from which attribute values may be taken.) Each
attribute has values taken from a domain. For example, the
domain of Name is string and that for salary is real

By: Wondwossen Mulugeta, Faculty of Informatics, AAU 26


Database Systems Lecture Note

• Whether the attribute is part of the entity identifier (attributes


which just describe an entity and those which help to identify it
uniquely)
• Whether it is permanent or time-varying (which attributes may
change their values over time)
• Whether it is required or optional for the entity (whose values will
sometimes be unknown or irrelevant)

Types of Attributes

(1) Simple (atomic) Vs Composite attributes


• Simple : contains a single value (not divided into sub parts)
E.g. Age, gender
• Composite: Divided into sub parts (composed of other
attributes)
E.g. Name, address

(2) Single-valued Vs multi-valued attributes


• Single-valued : have only single value(the value may
change but has only one value at one time)
E.g. Name, Sex, Id. No. color_of_eyes
• Multi-Valued: have more than one value
E.g. Address, dependent-name
Person may have several college degrees

(3) Stored vs. Derived Attribute


• Stored : not possible to derive or compute
E.g. Name, Address
• Derived: The value may be derived (computed) from the
values of other attributes.
E.g. Age (current year – year of birth)
Length of employment (current date- start date)
Profit (earning-cost)
G.P.A (grade point/credit hours)
(4) Null Values
• NULL applies to attributes which are not applicable or
which do not have values.
• You may enter the value NA (meaning not applicable)
• Value of a key attribute can not be null.

Default value - assumed value if no explicit value

By: Wondwossen Mulugeta, Faculty of Informatics, AAU 27


Database Systems Lecture Note

Entity versus Attributes


When designing the conceptual specification of the database, one should
pay attention to the distinction between an Entity and an Attribute.
„ Consider designing a database of employees for an organization:
„ Should address be an attribute of Employees or an entity (connected
to Employees by a relationship)?
• If we have several addresses per employee, address must
be an entity (attributes cannot be set-valued/multi valued)

3. If the structure (city, Woreda, Kebele, etc) is important, e.g. want to


retrieve employees in a given city, address must be modeled as an
entity (attribute values are atomic)

By: Wondwossen Mulugeta, Faculty of Informatics, AAU 28


Database Systems Lecture Note

4. The RELATIONSHIPS between entities which exist and must be taken into
account when processing information. In any business processing one object
may be associated with another object due to some event. Such kind of
association is what we call a RELATIONSHIP between entity objects.

• One external event or process may affect several related entities.


• Related entities require setting of LINKS from one part of the
database to another.
• A relationship should be named by a word or phrase which
explains its function
• Role names are different from the names of entities forming the
relationship: one entity may take on many roles, the same role may
be played by different entities
• For each RELATIONSHIP, one can talk about the Number of
Entities and the Number of Tuples participating in the association.
These two concepts are called DEGREE and CARDINALITY of a
relationship respectively.

Degree of a Relationship
• An important point about a relationship is how many entities
participate in it. The number of entities participating in a
relationship is called the DEGREE of the relationship.

Among the Degrees of relationship, the following are the basic:


O UNARY/RECURSIVE RELATIONSHIP: Tuples/records of a
Single entity are related withy each other.
O BINARY RELATIONSHIPS: Tuples/records of two entities are
associated in a relationship
O TERNARY RELATIONSHIP: Tuples/records of three different
entities are associated
o And a generalized one:
ƒ N-NARY RELATIONSHIP: Tuples from arbitrary
number of entity sets are participating in a relationship.

By: Wondwossen Mulugeta, Faculty of Informatics, AAU 29


Database Systems Lecture Note

Cardinality of a Relationship
• Another important concept about relationship is the number of
instances/tuples that can be associated with a single instance from
one entity in a single relationship. The number of instances
participating or associated with a single instance from an entity in a
relationship is called the CARDINALITY of the relationship. The
major cardinalities of a relationship are:
o ONE-TO-ONE: one tuple is associated with only one other
tuple.
ƒ E.g. Building – LocationÆ as a single building will be
located in a single location and as a single location
will only accommodate a single Building.
o ONE-TO-MANY, one tuple can be associated with many
other tuples, but not the reverse.
ƒ E.g. Department-Student Æas one department can
have multiple students.
o MANY-TO-ONE, many tuples are associated with one tuple
but not the reverse.
ƒ E.g. Employee – Department: as many employees
belong to a single department.
o MANY-TO-MANY: one tuple is associated with many other
tuples and from the other side, with a different role name
one tuple will be associated with many tuples
ƒ E.g. Student – CourseÆas a student can take many
courses and a single course can be attended by many
students.

By: Wondwossen Mulugeta, Faculty of Informatics, AAU 30


Database Systems Lecture Note

5. Relational Constraints/Integrity Rules

• Relational Integrity
¾ Domain Integrity: No value of the attribute should be
beyond the allowable limits
¾ Entity Integrity: In a base relation, no attribute of a
Primary Key can assume a value of NULL
¾ Referential Integrity: If a Foreign Key exists in a
relation, either the Foreign Key value must match a
Candidate Key value in its home relation or the
Foreign Key value must be NULL
¾ Enterprise Integrity: Additional rules specified by the
users or database administrators of a database are
incorporated

• Key constraints
If tuples are need to be unique in the database, and then we need to make
each tuple distinct. To do this we need to have relational keys that
uniquely identify each relation.

Super Key: an attribute or set of attributes that uniquely identifies a tuple


within a relation.
Candidate Key: a super key such that no proper subset of that collection is
a Super Key within the relation.
A candidate key has two properties:
1. Uniqueness
2. Irreducibility
If a super key is having only one attribute, it is automatically a
Candidate key.
If a candidate key consists of more than one attribute it is
called Composite Key.
Primary Key: the candidate key that is selected to identify tuples uniquely
within the relation.
The entire set of attributes in a relation can be considered as a
primary case in a worst case.
Foreign Key: an attribute, or set of attributes, within one relation that
matches the candidate key of some relation.
A foreign key is a link between different relations to create the view
or the unnamed relation

By: Wondwossen Mulugeta, Faculty of Informatics, AAU 31


Database Systems Lecture Note

• Relational Views
Relations are perceived as a Table from the users’ perspective. Actually,
there are two kinds of relation in relational database. The two categories or
tyapes of Relations are Named and Unnamed Relations. The basic
difference is on how the relation is created, used and updated:
1. Base Relation
A Named Relation corresponding to an entity in the conceptual
schema, whose tuples are physically stored in the database.
2. View (Unnamed Relation)
A View is the dynamic result of one or more relational operations
operating on the base relations to produce another virtual relation
that does not actually exist as presented. So a view is virtually
derived relation that does not necessarily exist in the database but
can be produced upon request by a particular user at the time of
request. The virtual table or relation can be created from single or
different relations by extracting some attributes and records with or
without conditions.

Purpose of a view
¾ Hides unnecessary information from users: since only part of
the base relation (Some collection of attributes, not necessarily
all) are to be included in the virtual table.
¾ Provide powerful flexibility and security: since unnecessary
information will be hidden from the user there will be some
sort of data security.
¾ Provide customized view of the database for users: each users
are going to be interfaced with their own preferred data set
and format by making use of the Views.
¾ A view of one base relation can be updated.
¾ Update on views derived from various relations is not
allowed since it may violate the integrity of the database.
¾ Update on view with aggregation and summary is not
allowed. Since aggregation and summary results are
computed from a base relation and does not exist actually.

By: Wondwossen Mulugeta, Faculty of Informatics, AAU 32


Database Systems Lecture Note

Schemas and Instances and Database State

When a database is designed using a Relational data model, all the data is
represented in a form of a table. In such definitions and representation, there are
two basic components of the database. The two components are the definition of
the Relation or the Table and the actual data stored in each table. The data
definition is what we call the Schema or the skeleton of the database and the
Relations with some information at some point in time is the Instance or the flesh
of the database.

Schemas
„ Schema describes how data is to be structured, defined at setup/Design
time (also called "metadata")
„ Since it is used during the database development phase, there is rare
tendency of changing the schema unless there is a need for system
maintenance which demands change to the definition of a relation.

z Database Schema (Intension): specifies name of relation and the


collection of the attributes (specifically the Name of attributes).
¾ refer to a description of database (or intention)
¾ specified during database design
¾ should not be changed unless during maintenance

z Schema Diagrams
¾ convention to display some aspect of a schema visually

z Schema Construct
¾ refers to each object in the schema (e.g. STUDENT)
E.g.: STUNEDT (FName,LName,Id,Year,Dept,Sex)

By: Wondwossen Mulugeta, Faculty of Informatics, AAU 33


Database Systems Lecture Note

Instances
„ Instance: is the collection of data in the database at a particular point of
time (snap-shot).
¾ Also called State or Snap Shot or Extension of the database
¾ Refers to the actual data in the database at a specific point in time
¾ State of database is changed any time we add, delete or update an
item.
¾ Valid state: the state that satisfies the structure and constraints
specified in the schema and is enforced by DBMS

„ Since Instance is actual data of database at some point in time, changes


rapidly
„ To define a new database, we specify its database schema to the DBMS
(database is empty)
„ database is initialized when we first load it with data

By: Wondwossen Mulugeta, Faculty of Informatics, AAU 34


Database Systems Lecture Note

Database Design
Database design is the process of coming up with different kinds of
specification for the data to be stored in the database. The database design
part is one of the middle phases we have in information systems
development where the system uses a database approach. Design is the
part on which we would be engaged to describe how the data should be
perceived at different levels and finally how it is going to be stored in a
computer system.

Information System with Database application consists of


several tasks which include:

¾ Planning of Information systems Design


¾ Requirements Analysis,
¾ Design (Conceptual, Logical and Physical Design)
¾ Tuning
¾ Implementation
¾ Operation and Support

From these different phases, the prime interest of a database system will be
the Design part which is again sub divided into other three sub-phases.
These sub-phases are:
1. Conceptual Design
2. Logical Design, and
3. Physical Design

¾ In general, one has to go back and forth between these tasks to refine
a database design, and decisions in one task can influence the
choices in another task.
¾ In developing a good design, one should answer such questions as:
ƒ What are the relevant Entities for the Organization
ƒ What are the important features of each Entity
ƒ What are the important Relationships
ƒ What are the important queries from the user
ƒ What are the other requirements of the Organization
and the Users

By: Wondwossen Mulugeta, Faculty of Informatics, AAU 35


Database Systems Lecture Note

The Three levels of Database Design


Conceptual Design

Logical Design

Physical Design

Conceptual Database Design


„ Conceptual design is the process of constructing a model of the
information used in an enterprise, independent of any physical
considerations.
„ It is the source of information for the logical design phase.
„ Mostly uses an Entity Relationship Model to describe the data at
this level.
„ After the completion of Conceptual Design one has to go for refinement of
the schema, which is verification of Entities, Attributes, and Relationships

Logical Database Design


„ Logical design is the process of constructing a model of the information
used in an enterprise based on a specific data model (e.g. relational,
hierarchical or network or object), but independent of a particular DBMS
and other physical considerations.
„ Normalization process
„ Collection of Rules to be maintained
„ Discover new entities in the process
„ Revise attributes based on the rules and the discovered
Entities

Physical Database Design


„ Physical design is the process of producing a description of the
implementation of the database on secondary storage. -- defines specific
storage or access methods used by database
„ Describes the storage structures and access methods used to
achieve efficient access to the data.
„ Tailored to a specific DBMS system -- Characteristics are function
of DBMS and operating systems
„ Includes estimate of storage space

By: Wondwossen Mulugeta, Faculty of Informatics, AAU 36


Database Systems Lecture Note

Conceptual Database Design


„ Conceptual design revolves around discovering and analyzing
organizational and user data requirements
„ The important activities are to identify
¾ Entities
¾ Attributes
¾ Relationships
¾ Constraints
„ And based on these components develop the ER model using
¾ ER diagrams

The Entity Relationship (E-R) Model


„ Entity-Relationship modeling is used to represent conceptual view of the
database
„ The main components of ER Modeling are:
o Entities
ƒ Corresponds to entire table, not row
ƒ Represented by Rectangle
o Attributes
ƒ Represents the property used to describe an entity or a
relationship
ƒ Represented by Oval
o Relationships
ƒ Represents the association that exist between entities
ƒ Represented by Diamond
o Constraints
ƒ Represent the constraint in the data

Before working on the conceptual design of the database, one


has to know and answer the following basic questions.
• What are the entities and relationships in the enterprise?
• What information about these entities and relationships should we
store in the database?
• What are the integrity constraints that hold? Constraints on each
data with respect to update, retrieval and store.
• Represent this information pictorially in ER diagrams, then map ER
diagram into a relational schema.

By: Wondwossen Mulugeta, Faculty of Informatics, AAU 37


Database Systems Lecture Note

Developing an E-R Diagram


„ Designing conceptual model for the database is not a one linear process
but an iterative activity where the design is refined again and again.
„ To identify the entities, attributes, relationships, and constraints on the
data, there are different set of methods used during the analysis phase.
These include information gathered by…
¾ Interviewing end users individually and in a group
¾ Questionnaire survey
¾ Direct observation
¾ Examining different documents

„ The basic E-R model is graphically depicted and presented for review.
„ The process is repeated until the end users and designers agree that the E-
R diagram is a fair representation of the organization’s activities and
functions.
„ Checking for Redundant Relationships in the ER Diagram. Relationships
between entities indicate access from one entity to another - it is therefore
possible to access one entity occurrence from another entity occurrence
even if there are other entities and relationships that separate them - this is
often referred to as Navigation' of the ER diagram
„ The last phase in ER modeling is validating an ER Model against
requirement of the user.

By: Wondwossen Mulugeta, Faculty of Informatics, AAU 38


Database Systems Lecture Note

Graphical Representations in ER Diagramming

„ Entity is represented by a RECTANGLE containing the


name of the entity.
Strong Entity Weak Entity

„ Connected entities are called relationship participants

„ Attributes are represented by OVALS and are


connected to the entity by a line.
Ova
Ovals Ovals Ovals
Ova

Multi-valued Composite Ova


Attribute Attribute Attribute

„ A derived attribute is indicated by a DOTTED LINE.


(……..)
Ovals

„ PRIMARY KEYS are underlined.

Key

„ Relationships are represented by DIAMOND shaped


symbols
„ Weak Relationship is a relationship between Weak and Strong
Entities
„ Strong Relationship is a relationship between two strong Entities

Diamond Diamond

Strong Relationship Weak Relationship

By: Wondwossen Mulugeta, Faculty of Informatics, AAU 39


Database Systems Lecture Note

Example 1: Build an ER Diagram for the following information:


„ A student record management system will have the following two
basic data object categories with their own features or properties:
Students will have an Id, Name, Dept, Age, GPA and Course will
have an Id, Name, Credit Hours
„ Whenever a student enroll in a course in a specific Academic
Year and Semester, the Student will have a grade for the
course

Name Dept DoB Id Name Credit

Id Gpa
Students Course

Age

Enrolled_In Semester
Academic
Year

Grade

Example 2: Build an ER Diagram for the following information:


„ A Personnel record management system will have the following two
basic data object categories with their own features or properties:
Employee will have an Id, Name, DoB, Age, Tel and Department
will have an Id, Name, Location
„ Whenever an Employee is assigned in one Department, the
duration of his stay in the respective department should be
registered.

By: Wondwossen Mulugeta, Faculty of Informatics, AAU 40


Database Systems Lecture Note

Structural Constraints on Relationship


1. Constraints on Relationship / Multiplicity/ Cardinality Constraints
¾ Multiplicity constraint is the number or range of possible occurrence of an entity
type/relation that may relate to a single occurrence/tuple of an entity
type/relation through a particular relationship.
¾ Mostly used to insure appropriate enterprise constraints.

One-to-one relationship:
¾ A customer is associated with at most one loan via the relationship borrower
¾ A loan is associated with at most one customer via borrower

E.g.: Relationship Manages between STAFF and BRANCH


The multiplicity of the relationship is:
¾ One branch can only have one manager
¾ One employee could manage either one or no branches

1..1 Manages
0..1
Employee Branch

By: Wondwossen Mulugeta, Faculty of Informatics, AAU 41


Database Systems Lecture Note

One-To-Many Relationships
¾ In the one-to-many relationship a loan is associated with at most one customer
via borrower, a customer is associated with several (including 0) loans via
borrower

E.g.: Relationship Leads between STAFF and PROJECT


The multiplicity of the relationship
¾ One staff may Lead one or more project(s)
¾ One project is Lead by one staff

1 1 Leads
0..*
Employee Project

Many-To-Many Relationship
¾ A customer is associated with several (possibly 0) loans via borrower
¾ A loan is associated with several (possibly 0) customers via borrower

E.g.: Relationship Teaches between INSTRUCTOR and COURSE


The multiplicity of the relationship
¾ One Instructor Teaches one or more Course(s)
¾ One Course Thought by Zero or more Instructor(s)

0..* Teaches
1..*
Instructor Course

By: Wondwossen Mulugeta, Faculty of Informatics, AAU 42


Database Systems Lecture Note

Participation of an Entity Set in a Relationship Set


Participation constraint of a relationship is involved in identifying and
setting the mandatory or optional feature of an entity occurrence to take a
role in a relationship. There are two distinct participation constraints with
this respect, namely: Total Participation and Partial Participation

¾ Total participation: every tuple in the entity or relation participates in


at least one relationship by taking a role. This means, every tuple in a
relation will be attached with at least one other tuple. The entity with total
participation in a relationship will be connected to the relationship using a
double line.
¾ Partial participation: some tuple in the entity or relation may not
participate in the relationship. This means, there is at least one tuple from
that Relation not taking any role in that specific relationship. The entity
with partial participation in a relationship will be connected to the
relationship using a single line.

¾ E.g. 1: Participation of EMPLOYEE in “belongs to” relationship with


DEPARTMENT is total since every employee should belong to a
department.
Participation of DEPARTMENT in “belongs to” relationship with
EMPLOYEE is total since every department should have more than
one employee.

Employee Belongs To Department

¾ E.g. 2: Participation of EMPLOYEE in “manages” relationship with


DEPARTMENT, is partial participation since not all employees are
managers.
Participation of DEPARTMENT in “Manages” relationship with
EMPLOYEE is total since every department should have a manager.

Employee Manages Department

By: Wondwossen Mulugeta, Faculty of Informatics, AAU 43


Database Systems Lecture Note

Problem in ER Modeling
The Entity-Relationship Model is a conceptual data model that views the real
world as consisting of entities and relationships. The model visually represents
these concepts by the Entity-Relationship diagram. The basic constructs of the ER
model are entities, relationships, and attributes. Entities are concepts, real or
abstract, about which information is collected. Relationships are associations
between the entities. Attributes are properties which describe the entities.

While designing the ER model one could face a problem on the design which is
called a connection traps. Connection traps are problems arising from
misinterpreting certain relationships

There are two types of connection traps;


1. Fan trap:
Occurs where a model represents a relationship between entity types, but
the pathway between certain entity occurrences is ambiguous.
May exist where two or more one-to-many (1:M) relationships fan out
from an entity. The problem could be avoided by restructuring the model
so that there would be no 1:M relationships fanning out from a singe
entity and all the semantics of the relationship is preserved.

Example:

1..* Works 1..1 1..1 IsAssigned 1..*


EMPLOYEE For BRANCH CAR

Semantics description of the problem;

Emp1 Bra1 Car1


Emp2 Bra2 Car2
Emp3 Bra3 Car3
Emp4 Bra4 Car4
Emp5 Car5
Emp6 Car6
Emp7 Car7

Problem: Which car (Car1 or Car3 or Car5) is used by Employee 6 Emp6 working
in Branch 1 (Bra1)? Thus from this ER Model one can not tell which car is used by
which staff since a branch can have more than one car and also a branch is

By: Wondwossen Mulugeta, Faculty of Informatics, AAU 44


Database Systems Lecture Note

populated by more than one employee. Thus we need to restructure the model to
avoid the connection trap.

To avoid the Fan Trap problem we can go for restructuring of the E-R Model.
This will result in the following E-R Model.

1..1 Has 1..* 1..* Used By 1..*


BRANCH CAR EMPLOYEE

Semantics description of the problem;

Car1
Bra1 Emp1
Car2
Bra2 Emp2
Car3
Bra3 Emp3
Car4
Bra4 Emp4
Car5
Emp5
Car6
Emp6
Car7
Emp7

By: Wondwossen Mulugeta, Faculty of Informatics, AAU 45


Database Systems Lecture Note

2. Chasm Trap:
Occurs where a model suggests the existence of a relationship between
entity types, but the path way does not exist between certain entity
occurrences.
May exist when there are one or more relationships with a minimum
multiplicity on cardinality of zero forming part of the pathway between
related entities.

Example:

1..1 Has 1..* 0..1 Manages 0..*


BRANCH EMPLOYEE PROJECT

If we have a set of projects that are not active currently then we can not
assign a project manager for these projects. So there are project with no
project manager making the participation to have a minimum value of
zero.

Problem:
How can we identify which BRANCH is responsible for which PROJECT?
We know that whether the PROJECT is active or not there is a responsible
BRANCH. But which branch is a question to be answered, and since we
have a minimum participation of zero between employee and PROJECT
we can’t identify the BRANCH responsible for each PROJECT.

The solution for this Chasm Trap problem is to add another relation ship
between the extreme entities (BRANCH and PROJECT)

1..1 Has 1..* 0..1 Manages 0..*


BRANCH EMPLOYEE PROJECT

1..1 Responsible for 1..*

By: Wondwossen Mulugeta, Faculty of Informatics, AAU 46


Database Systems Lecture Note

Enhanced E-R (EER) Models


„ Object-oriented extensions to E-R model
„ EER is important when we have a relationship between two entities
and the participation is partial between entity occurrences. In such
cases EER is used to reduce the complexity in participation and
relationship complexity.
„ ER diagrams consider entity types to be primitive objects
„ EER diagrams allow refinements within the structures of entity types

„ EER Concepts
„ Generalization
„ Specialization
„ Sub classes
„ Super classes
„ Attribute Inheritance
„ Constraints on specialization and generalization

By: Wondwossen Mulugeta, Faculty of Informatics, AAU 47


Database Systems Lecture Note

„ Generalization
¾ Generalization occurs when two or more entities represent categories
of the same real-world object.
¾ Generalization is the process of defining a more general entity type
from a set of more specialized entity types.
¾ A generalization hierarchy is a form of abstraction that specifies that
two or more entities that share common attributes can be generalized
into a higher level entity type.
¾ Is considered as bottom-up definition of entities.
¾ Generalization hierarchy depicts relationship between higher level
superclass and lower level subclass.
Generalization hierarchies can be nested. That is, a subtype of one
hierarchy can be a supertype of another. The level of nesting is limited
only by the constraint of simplicity.
Example: Account is a generalized form for Saving and Current
Accounts

By: Wondwossen Mulugeta, Faculty of Informatics, AAU 48


Database Systems Lecture Note

„ Specialization
¾ Is the result of subset of a higher level entity set to form a lower level
entity set.
¾ The specialized entities will have additional set of attributes
(distinguishing characteristics) that distinguish them from the
generalized entity.
¾ Is considered as Top-Down definition of entities.
¾ Specialization process is the inverse of the Generalization process.
Identify the distinguishing features of some entity occurrences, and
specialize them into different subclasses.
¾ Reasons for Specialization
o Attributes only partially applying to superclasses
o Relationship types only partially applicable to the superclass
¾ In many cases, an entity type has numerous sub-groupings of its
entities that are meaningful and need to be represented explicitly. This
need requires the representation of each subgroup in the ER model.
The generalized entity is a superclass and the set of specialized entities
will be subclasses for that specific Superclass.
o Example: Saving Accounts and Current Accounts are
Specialized entities for the generalized entity Accounts.
Manager, Sales, Secretary: are specialized employees.

„ Subclass/Subtype
¾ An entity type whose tuples have attributes that distinguish its
members from tuples of the generalized or Superclass entities.
¾ When one generalized Superclass has various subgroups with
distinguishing features and these subgroups are represented by
specialized form, the groups are called subclasses.
¾ Subclasses can be either mutually exclusive (disjoint) or overlapping
(inclusive).
¾ A single subclass may inherit attributes from two distinct superclasses.
¾ A mutually exclusive category/subclass is when an entity instance can
be in only one of the subclasses.
E.g.: An EMPLOYEE can either be SALARIED or PART-TIMER but
not both.
¾ An overlapping category/subclass is when an entity instance may be
in two or more subclasses.
E.g.: A PERSON who works for a university can be both
EMPLOYEE and a STUDENT at the same time.

By: Wondwossen Mulugeta, Faculty of Informatics, AAU 49


Database Systems Lecture Note

„ Superclass /Supertype
¾ An entity type whose tuples share common attributes. Attributes that
are shared by all entity occurrences (including the identifier) are
associated with the supertype.
¾ Is the generalized entity

„ Relationship Between Superclass and Subclass


¾ The relationship between a superclass and any of its subclasses
is called a superclass/subclass or class/subclass relationship
¾ An instance can not only be a member of a subclass. i.e. Every
instance of a subclass is also an instance in the Superclass.
¾ A member of a subclass is represented as a distinct database
object, a distinct record that is related via the key attribute to its
super-class entity.
¾ An entity cannot exist in the database merely by being a
member of a subclass; it must also be a member of the super-
class.
¾ An entity occurrence of a sub class not necessarily should
belong to any of the subclasses unless there is full participation
in the specialization.
¾ A member of a subclass is represented as a distinct database
object, a distinct record that is related via the key attribute to its
super-class entity.
¾ The relationship between a subclass and a Superclass is an “IS
A” or “IS PART OF” type.
ƒ Subclass IS PART OF Superclass
ƒ Manager IS AN Employee
¾ All subclasses or specialized entity sets should be connected
with the superclass using a line to a circle where there is a
subset symbol indicating the direction of subclass/superclass
relationship.

By: Wondwossen Mulugeta, Faculty of Informatics, AAU 50


Database Systems Lecture Note

¾ We can also have subclasses of a subclass forming a hierarchy


of specialization.
¾ Superclass attributes are shared by all subclasses f that
superclass
¾ Subclass attributes are unique for the subclass.

„ Attribute Inheritance
¾ An entity that is a member of a subclass inherits all the
attributes of the entity as a member of the superclass.
¾ The entity also inherits all the relationships in which the
superclass participates.
¾ An entity may have more than one subclass categories.
¾ All entities/subclasses of a generalized entity or superclass
share a common unique identifier attribute (primary key). i.e.
The primary key of the superclass and subclasses are always
identical.

• Consider the EMPLOYEE supertype entity shown above. This


entity can have several different subtype entities (for example:
HOURLY and SALARIED), each with distinct properties not shared
by other subtypes. But whether the employee is HOURLY or
SALARIED, same attributes (EmployeeId, Name, and DateHired)
are shared.
• The Supertype EMPLOYEE stores all properties that subclasses have
in common. And HOURLY employees have the unique attribute
Wage (hourly wage rate), while SALARIED employees have two
unique attributes, StockOption and Salary.

By: Wondwossen Mulugeta, Faculty of Informatics, AAU 51


Database Systems Lecture Note

Constraints on specialization and generalization


„ Completeness Constraint.
• The Completeness Constraint addresses the issue of whether or not an
occurrence of a Superclass must also have a corresponding Subclass
occurrence.
• The completeness constraint requires that all instances of the subtype be
represented in the supertype.
• The Total Specialization Rule specifies that an entity occurrence should
at least be a member of one of the subclasses. Total Participation of
superclass instances on subclasses is diagrammed with a double line from
the Supertype to the circle as shown below.

E.g.: If we have EXTENTION and REGULAR as subclasses of a


superclass STUDENT, then it is mandatory that each student to
be either EXTENTION or REGULAR student. Thus the
participation of instances of STUDENT in EXTENTION and
REGULAR subclasses will be total.

• The Partial Specialization Rule specifies that it is not necessary for all
entity occurrences in the superclass to be a member of one of the
subclasses. Here we have an optional participation on the specialization.
Partial Participation of superclass instances on subclasses is diagrammed
with a single line from the Supertype to the circle.

E.g.: If we have MANAGER and SECRETARY as subclasses of a


superclass EMPLOYEE, then it is not the case that all employees
are either manager or secretary. Thus the participation of
instances of employee in MANAGER and SECRETARY
subclasses will be partial.

By: Wondwossen Mulugeta, Faculty of Informatics, AAU 52


Database Systems Lecture Note

„ Disjointness Constraints.
• Specifies the rule whether one entity occurrence can be a member of
more than one subclasses. i.e. it is a type of business rule that deals
with the situation where an entity occurrence of a Superclass may
also have more than one Subclass occurrence.
• The Disjoint Rule restricts one entity occurrence of a superclass to
be a member of only one of the subclasses. Example: a EMPLOYEE
can either be SALARIED or PART-TIMER, but not the both at the
same time.
• The Overlap Rule allows one entity occurrence to be a member f
more than one subclass. Example: EMPLOYEE working at the
university can be both a STUDENT and an EMPLOYEE at the same
time.
• This is diagrammed by placing either the letter "d" for disjoint or "o"
for overlapping inside the circle on the Generalization Hierarchy
portion of the E-R diagram.

The two types of constraints on generalization and specialization


(Disjointness and Completeness constraints) are not dependent on one
another. That is, being disjoint will not favour whether the tuples in the
superclass should have Total or Partial participation for that specific
specialization.

From the two types of constraints we can have four possible constraints

„ Disjoint AND Total

„ Disjoint AND Partial

„ Overlapping AND Total

„ Overlapping AND Partial

By: Wondwossen Mulugeta, Faculty of Informatics, AAU 53

You might also like