0% found this document useful (0 votes)

27 views135 pages

DB Lecture Note All in ONE

The document provides an introduction to database systems, explaining their importance in managing large datasets within organizations and the evolution from manual and traditional file-based approaches to the database approach. It highlights the benefits of using a Database Management System (DBMS) for efficient data storage, retrieval, and integrity, while also discussing the limitations and complexities involved. Additionally, it outlines the components of a DBMS environment, including hardware, software, data, procedures, and people.

Uploaded by

meyohannes2016

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views135 pages

DB Lecture Note All in ONE

Uploaded by

meyohannes2016

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

Database Systems Lecture Note

Chapter 1
Introduction to Database System
Database systems are designed to manage large data set in an
organization. The data management involves both definition and the
manipulation of the data which ranges from simple representation of
the data to considerations of structures for the storage of information.
The data management also consider the provision of mechanisms for
the manipulation of information.

Today, Databases are essential to every business. They are used to

maintain internal records, to present data to customers and clients on
the World-Wide-Web, and to support many other commercial
processes. Databases are likewise found at the core of many modern
organizations.
The power of databases comes from a body of knowledge and
technology that has developed over several decades and is embodied
in specialized software called a database management system, or
DBMS. A DBMS is a powerful tool for creating and managing large
amounts of data efficiently and allowing it to persist over long periods
of time, safely. These systems are among the most complex types of
software available.

Thus, for our question: What is a database? In essence a database is

nothing more than a collection of shared information that exists over a
long period of time, often many years. In common dialect, the term
database refers to a collection of data that is managed by a DBMS.

Thus the DB course is about:

 How to organize data
 Supporting multiple users
 Efficient and effective data retrieval
 Secured and reliable storage of data
 Maintaining consistent data
 Making information useful for decision making

Data management passes through the different levels of development

along with the development in technology and services. These levels
could best be described by categorizing the levels into three levels of
development. Even though there is an advantage and a problem
overcome at each new level, all methods of data handling are in use to
some extent. The major three levels are;

1. Manual Approach
2. Traditional File Based Approach

1
Database Systems Lecture Note

3. Database Approach
1. Manual Approach
In the manual approach, data storage and retrieval follows the
primitive and traditional way of information handling where cards and
paper are used for the purpose. The data storage and retrieval will be
performed using human labour.

 Files for as many event and objects as

the organization has are used to store information.
 Each of the files containing various kinds
of information is labelled and stored in one ore more cabinets.
 The cabinets could be kept in safe places
for security purpose based on the sensitivity of the information
contained in it.
 Insertion and retrieval is done by
searching first for the right cabinet then for the right the file then
the information.
 One could have an indexing system to
facilitate access to the data

Limitations of the Manual approach

 Prone to error
 Difficult to update, retrieve, integrate
 You have the data but it is difficult to
compile the information
 Limited to small size information
 Cross referencing is difficult

An alternative approach of data handling is a computerized way of

dealing with the information. The computerized approach could also be
either decentralized or centralized based on where the data resides in
the system.

2
Database Systems Lecture Note

2. Traditional File Based Approach

After the introduction of Computers for data processing to the
business community, the need to use the device for data storage
and processing increase. There were, and still are, several computer
applications with file based processing used for the purpose of data
handling. Even though the approach evolved over time, the basic
structure is still similar if not identical.
 File based systems were an early
attempt to computerize the manual filing system.
 This approach is the decentralized
computerized data handling method.
 A collection of application programs
perform services for the end-users. In such systems, every
application program that provides service to end users define
and manage its own data
 Such systems have number of programs
for each of the different applications in the organization.
 Since every application defines and
manages its own data, the system is subjected to serious data
duplication problem.
 File, in traditional file based approach, is
a collection of records which contains logically related data.

3
Database Systems Lecture Note

Limitations of the Traditional File Based approach

4
Database Systems Lecture Note

As business application become more complex demanding more

flexible and reliable data handling methods, the shortcomings of the
file based system became evident. These shortcomings include, but
not limited to:
 Separation or Isolation of Data: Available
information in one application may not be known. Data
Synchronisation is done manually.
 Limited data sharing- every application
maintains its own data.
 Lengthy development and maintenance
time
 Duplication or redundancy of data
(money and time cost and loss of data integrity)
 Data dependency on the application-
data structure is embedded in the application; hence, a change
in the data structure needs to change the application as well.
 Incompatible file formats or data
structures (e.g. “C” and COBOL) between different applications
and programs creating inconsistency and difficulty to process
jointly.
 Fixed query processing which is defined
during application development
The limitations for the traditional file based data handling approach
arise from two basic reasons.
1. Definition of the data is embedded in the
application program which makes it difficult to modify the
database definition easily.
2. No control over the access and manipulation of
the data beyond that imposed by the application programs.
The most significant problem experienced by the traditional file based
approach of data handling can be formalized by what is called
“update anomalies”. We have three types of update anomalies;
1. Modification Anomalies: a problem experienced when one ore
more data value is modified on one application program but not
on others containing the same data set.
2. Deletion Anomalies: a problem encountered where one record
set is deleted from one application but remain untouched in
other application programs.
3. Insertion Anomalies: a problem experienced when ever there
is new data item to be recorded, and the recording is not made
in all the applications. And when same data item is inserted at
different applications, there could be errors in encoding which
makes the new data item to be considered as a totally different
object.

5
Database Systems Lecture Note

3. Database Approach
Following a famous paper written by Dr. Edgard Frank Codd in 1970,
database systems changed significantly. Codd proposed that database
systems should present the user with a view of data organized as
tables called relations. Behind the scenes, there might be a complex
data structure that allowed rapid response to a variety of queries. But,
unlike the user of earlier database systems, the user of a relational
system would not be concerned with the storage structure. Queries
could be expressed in a very high-level language, which greatly
increased the efficiency of database programmers. The database
approach emphasizes the integration and sharing of data throughout
the organization.

Thus in Database Approach:

 Database is just a computerized record keeping system or
a kind of electronic filing cabinet.
 Database is a repository for collection of computerized
data files.
 Database is a shared collection of logically related data
and description of data designed to meet the information needs of
an organization. Since it is a shared corporate resource, the
database is integrated with minimum amount of or no duplication.
 Database is a collection of logically related data where
these logically related data comprises entities, attributes,
relationships, and business rules of an organization's information.
 In addition to containing data required by an organization,
database also contains a description of the data which is known as
“Metadata” or “Data Dictionary” or “Systems Catalogue” or
“Data about Data” or some times “Data Directory”.
 Since a database contains information about the data
(metadata), it is called a self descriptive collection of integrated
records.
 The purpose of a database is to store information and to
allow users to retrieve and update that information on demand.
 Database is deigned once and used simultaneously by
many users.
 Unlike the traditional file based approach in database
approach there is program data independence. That is the
separation of the data definition from the application. Thus the
application is not affected by changes made in the data structure
and file organization.
 Each database application will perform the combination of:
Creating database, Reading, Updating and Deleting data.

6
Database Systems Lecture Note

Benefits of the database approach

 Data can be shared: two or more users can access and use same
data instead of storing data in redundant manner for each user.
 Improved accessibility of data: by using structured query
languages, the users can easily access data without
programming experience.
 Redundancy can be reduced: isolated data is integrated in
database to decrease the redundant data stored at different
applications.
 Quality data can be maintained: the different integrity
constraints in the database approach will maintain the quality
leading to better decision making
 Inconsistency can be avoided: controlled data redundancy will
avoid inconsistency of the data in the database to some extent.
 Transaction support can be provided: basic demands of any
transaction support systems are implanted in a full scale DBMS.
 Integrity can be maintained: data at different applications will be
integrated together with additional constraints to facilitate
validity and consistency of shared data resource.
 Security measures can be enforced: the shared data can be
secured by having different levels of clearance and other data
security mechanisms.
 Improved decision support: the database will provide information
useful for decision making.
 Standards can be enforced: the different ways of using and
dealing with data by different unite of an organization can be
balanced and standardized by using database approach.
 Compactness: since it is an electronic data handling method, the
data is stored compactly (no voluminous papers).
 Speed: data storage and retrieval is fast as it will be using the
modern fast computer systems.
 Less labour: unlike the other data handling methods, data
maintenance will not demand much resource.
 Centralized information control: since relevant data in the
organization will be stored at one repository, it can be controlled
and managed at the central level.

7
Database Systems Lecture Note

Limitations and risk of Database Approach

 Introduction of new professional and specialized personnel.
 Complexity in designing and managing data
 The cost and risk during conversion from the old to the new
system
 High cost to be incurred to develop and maintain the system
 Complex backup and recovery services from the users
perspective
 Reduced performance due to centralization and data
independency
 High impact on the system when failure occurs to the central
system.

8
Database Systems Lecture Note

Database Management System (DBMS)

Database Management System (DBMS) is a Software package used for
providing EFFICIENT, CONVENIENT and SAFE MULTI-USER ( many
people/programs accessing same database, or even same data, simultaneously ) storage of
and access to MASSIVE amounts of PERSISTENT ( data outlives programs that
operate on it) data. A DBMS also provides a systematic method for
creating, updating, storing, retrieving data in a database. DBMS also
provides the service of controlling data access, enforcing data
integrity, managing concurrency control, and recovery. Having this in
mind, a full scale DBMS should at least have the following services
to provide to the user.

1. Data storage, retrieval and update in

the database
2. A user accessible catalogue
3. Transaction support service: ALL or
NONE transaction, which minimize data inconsistency.
4. Concurrency Control Services: access
and update on the database by different users
simultaneously should be implemented correctly.
5. Recovery Services: a mechanism for
recovering the database after a failure must be available.
6. Authorization Services (Security): must
support the implementation of access and authorization
service to database administrator and users.
7. Support for Data Communication:
should provide the facility to integrate with data transfer
software or data communication managers.
8. Integrity Services: rules about data and
the change that took place on the data, correctness and
consistency of stored data, and quality of data based on
business constraints.
9. Services to promote data independency
between the data and the application
10. Utility services: sets of utility service
facilities like
 Importing data
 Statistical analysis support
 Index reorganization
 Garbage collection

9
Database Systems Lecture Note

DBMS and Components of DBMS Environment

Fig. General architecture of a DBMS

A DBMS is software package used to design, manage, and maintain

databases. Each DBMS should have facilities to define the database,
manipulate the content of the database and control the database.
These facilities will help the designer, the user as well as the
database administrator to discharge their responsibility in designing,
using and managing the database. It provides the following facilities:

 Data Definition Language (DDL):

o Language used to define each data element
required by the organization.
o Commands for setting up schema or the
intension of database
o These commands are used to setup a
database, create, delete and alter table with the facility of
handling constraints

10
Database Systems Lecture Note

 Data Manipulation Language (DML):

o Is a core command used by end-users and
programmers to store, retrieve, and access the data in the
database e.g. SQL
o Since the required data or Query by the user
will be extracted using this type of language, it is also
called "Query Language"

 Data Dictionary:
o Due to the fact that a database is a self
describing system, this tool, Data Dictionary, is used to
store and organize information about the data stored in the
database.

 Data Control Language:

o Database is a shared resource that
demands control of data access and usage. The database
administrator should have the facility to control the overall
operation of the system.
o Data Control Languages are commands that
will help the Database Administrator to control the
database.
o The commands include grant or revoke
privileges to access the database or particular object
within the database and to store or remove database
transactions

The DBMS is software package that helps to design, manage, and use
data using the database approach. Taking a DBMS as a system, one
can describe it with respect to it environment or other systems
interacting with the DBMS. The DBMS environment has five
components. To design and use a database, there will be the
interaction or integration of Hardware, Software, Data, Procedure and
People.

1. Hardware: are components that one can touch and feel.

These components are comprised of various types of personal
computers, mainframe or any server computers to be used in
multi-user system, network infrastructure, and other peripherals
required in the system.

11
Database Systems Lecture Note

2. Software: are collection of commands and programs used

to manipulate the hardware to perform a function. These include
components like the DBMS software, application programs,
operating systems, network software, language software and
other relevant software.

3. Data: since the goal of any database system is to have better

control of the data and making data useful, Data is the most
important component to the user of the database. There are two
categories of data in any database system: that is Operational
and Metadata. Operational data is the data actually stored in the
system to be used by the user. Metadata is the data that is used
to store information about the database itself.
The structure of the data in the database is called the schema,
which is composed of the Entities, Properties of entities, and
relationship between entities and business constraints.

4. Procedure: this is the rules and regulations on how to

design and use a database. It includes procedures like how to log
on to the DBMS, how to use facilities, how to start and stop
DBMS, how to make backup, how to treat hardware and software
failure, how to change the structure of the database.

5. People: this component is composed of the people in the

organization that are responsible or play a role in designing,
implementing, managing, administering and using the resources
in the database. This component includes group of people with
high level of knowledge about the database and the design
technology to other with no knowledge of the system except
using the data in the database.

12
Database Systems Lecture Note

Database Development Life Cycle (DDLC)

As it is one component in most information system development tasks,
there are several steps in designing a database system. Here more
emphasis is given to the design phases of the system development life
cycle. The major steps in database design are;

1. Planning: that is identifying information gap in an

organization and propose a database solution to solve the
problem.

2. Analysis: that concentrates more on fact finding about the

problem or the opportunity. Feasibility analysis, requirement
determination and structuring, and selection of best design
method are also performed at this phase.

3. Design: in database development more emphasis is given to

this phase. The phase is further divided into three sub-phases.
a. Conceptual Design: concise description of the data, data
type, relationship between data and constraints on the
data.
 There is no implementation or physical detail
consideration.
 Used to elicit and structure all information
requirements
b. Logical Design: a higher level conceptual abstraction with
selected specific data model to implement the data
structure.
 It is particular DBMS independent and with no
other physical considerations.
c. Physical Design: physical implementation of the logical
design of the database with respect to internal storage and
file structure of the database for the selected DBMS.
 To develop all technology and organizational
specification.

4. Implementation: the testing and deployment of the

designed database for use.

5. Operation and Support: administering and maintaining

the operation of the database system and providing support to
users. Tuning the database operations for best performance.

13
Database Systems Lecture Note

Roles in Database Design and Use

As people are one of the components in DBMS environment, there are

group of roles played by different stakeholders of the designing and
operation of a database system.

1. Database Administrator (DBA)

 Responsible to oversee, control and manage the database
resources (the database itself, the DBMS and other related
software)
 Authorizing access to the database
 Coordinating and monitoring the use of the database
 Responsible for determining and acquiring hardware and
software resources
 Accountable for problems like poor security, poor performance of
the system
 Involves in all steps of database development
We can have further classifications of this role in big organizations
having huge amount of data and user requirement.
1. Data Administrator (DA): is responsible on
management of data resources. This involves in database
planning, development, maintenance of standards policies
and procedures at the conceptual and logical design
phases.

2. Database Administrator (DBA): This is more technically

oriented role. DBA is responsible for the physical
realization of the database. It is involved in physical
design, implementation, security and integrity control of
the database.

2. Database Designer (DBD)

 Identifies the data to be stored and choose the appropriate
structures to represent and store the data.
 Should understand the user requirement and should choose how
the user views the database.
 Involve on the design phase before the implementation of the
database system.
We have two distinctions of database designers, one involving in
the logical and conceptual design and another involving in physical
design.

14
Database Systems Lecture Note

1. Logical and Conceptual DBD

 Identifies data (entity, attributes and
relationship) relevant to the organization
 Identifies constraints on each data
 Understand data and business rules in the
organization
 Sees the database independent of any data
model at conceptual level and consider one specific
data model at logical design phase.

2. Physical DBD
 Take logical design specification as input and decide
how it should be physically realized.
 Map the logical data model on the specified DBMS with
respect to tables and integrity constraints. (DBMS
dependent designing)
 Select specific storage structure and access path to the
database
 Design security measures required on the database

3. Application Programmer and Systems Analyst

 System analyst determines the user requirement and how the
user wants to view the database.
 The application programmer implements these specifications
as programs; code, test, debug, document and maintain the
application program.
 The application programmer determines the interface on how
to retrieve, insert, update and delete data in the database.
 The application could use any high level programming
language according to the availability, the facility and the
required service.

4. End Users
Workers, whose job requires accessing the database frequently
for various purposes, there are different group of users in this
category.
1. Naïve Users:
 Sizable proportion of users
 Unaware of the DBMS
 Only access the database based on their access
level and demand
 Use standard and pre-specified types of queries.
2. Sophisticated Users
 Users familiar with the structure of the Database
and facilities of the DBMS.

15
Database Systems Lecture Note

 Have complex requirements

 Have higher level queries
 Are most of the time engineers, scientists,
business analysts, etc
3. Casual Users
 Users who access the database occasionally.
 Need different information from the database
each time.
 Use sophisticated database queries to satisfy
their needs.
 Are most of the time middle to high level
managers.

These users can be again classified as “Actors on the Scene” and

“Workers Behind the Scene”.

Actors on the Scene:

 Data Administrator
 Database Administrator
 Database Designer
 End Users

Workers behind the scene

 DBMS designers and implementers: who design and implement
different DBMS software.
 Tool Developers: experts who develop software packages that
facilitates database system designing and use. Prototype,
simulation, code generator developers could be an example.
Independent software vendors could also be categorized in this
group.
 Operators and Maintenance Personnel: system
administrators who are responsible for actually running and
maintaining the hardware and software of the database system
and the information technology facilities.

16
Database Systems Lecture Note

ANSI-SPARC Architecture
The purpose and origin of the Three-Level
database architecture
 All users should be able to access same data. This is
important since the database is having a shared data feature
where all the data is stored in one location and all users will
have their own customized way of interacting with the data.
 A user's view is unaffected or immune to changes made in
other views. Since the requirement of one user is independent
of the other, a change made in one user’s view should not
affect other users.
 Users should not need to know physical database storage
details. As there are naïve users of the system, hardware level
or physical details should be a black-box for such users.
 DBA should be able to change database storage structures
without affecting the users' views. A change in file
organization, access method should not affect the structure of
the data which in turn will have no effect on the users.
 Internal structure of database should be unaffected by
changes to physical aspects of storage, such as change of
hard disk
 DBA should be able to change conceptual structure of
database without affecting all users. In any database system,
the DBA will have the privilege to change the structure of the
database, like adding tables, adding and deleting an attribute,
changing the specification of the objects in the database.
All of the above and much more functionalities are
possible due to the three level ANSI-SPARC architecture.

Three-level ANSI-SPARC Architecture of a Database

17
Database Systems Lecture Note

ANSI-SPARC Architecture and Database Design

Phases

External Level: Users' view of the database. It describes that part

of database that is relevant to a particular user. Different users
have their own customized view of the database independent of
other users.

Conceptual Level: Community view of the database. Describes

what data is stored in database and relationships among the data
along with the business constraints.

Internal Level: Physical representation of the database on the

computer. Describes how the data is stored in the database.

18
Database Systems Lecture Note

The following example can be taken as an illustration for the difference

between the three levels in the ANSI-SPARC database Architecture.
Where:
 The first level is concerned about the group of users and
their respective data requirement independent of the
other.
 The second level is describing the whole content of the
database where one piece of information will be
represented once.
 The third level

Differences between Three Levels of ANSI-SPARC Architecture

19
Database Systems Lecture Note

Defines DBMS schemas at three levels:

Internal schema: at the internal level to describe physical storage
structures and access paths. Typically uses a physical data model
i.e. specific DBMS.

Conceptual schema: at the conceptual level to describe the structure

and constraints for the whole database for a community of users. It
uses a conceptual or an implementation data model.

External schema: at the external level to describe the various user

views. Usually uses the same data model as the conceptual level.

Data Independence
Logical Data Independence:
 Refers to immunity of external schemas to changes in
conceptual schema.
 Conceptual schema changes e.g. addition/removal of
entities should not require changes to external schema or
rewrites of application programs.
 The capacity to change the conceptual schema without
having to change the external schemas and their
application programs.

Physical Data Independence

 The ability to modify the physical schema without
changing the logical schema
 Applications depend on the logical schema
 In general, the interfaces between the various levels and
components should be well defined so that changes in
some parts do not seriously influence others.
 The capacity to change the internal schema without
having to change the conceptual schema

 Refers to immunity of conceptual schema to changes in

the internal schema
 Internal schema changes e.g. using different file
organizations, storage structures/devices should not
require change to conceptual or external schemas.

20
Database Systems Lecture Note

Data Independence and the ANSI-SPARC Three-level

Architecture

21
Database Systems Lecture Note

The distinction between a Data Definition Language (DDL)

and a Data Manipulation Language (DML)

Database Languages
Data Definition Language (DDL)
 Allows DBA or user to describe and name entitles,
attributes and relationships required for the application.
 Specification notation for defining the database schema

Data Manipulation Language (DML)

 Provides basic data manipulation operations on data held
in the database.
 Language for accessing and manipulating the data
organized by the appropriate data model
 DML also known as query language

Procedural DML: user specifies what data is required and

how to get the data.

Non-Procedural DML: user specifies what data is required

but not how it is to be retrieved
Data Control Language (DCL)
 Allows a DBA to define access control and privileges for
users.
 It is a mechanism for implementing security at a
database object level.
 Uses the Grant and Revoke SQL Statements

SQL is the most widely used non-procedural query

language

Fourth Generation Language (4GL)

 Query Languages
 Forms Generators
 Report Generators
 Graphics Generators
 Application Generators

22
Database Systems Lecture Note

A Classification of data models

Data Model
A specific DBMS has its own specific Data Definition Language to
define a database schema, but this type of language is too low
level to describe the data requirements of an organization in a
way that is readily understandable by a variety of users.
We need a higher-level language.
Such a higher-level description of the database schema is called
data-model.

Data Model: a set of concepts to describe the structure of a

database, and certain constraints that the database should
obey.

A data model is a description of the way that data is stored in a

database. Data model helps to understand the relationship between
entities and to create the most effective structure to hold data.

Data Model is a collection of tools or concepts for describing

 Data
 Data relationships
 Data semantics
 Data constraints

The main purpose of Data Model is to represent the data in an

understandable way.
Categories of data models include:
 Object-based
 Record-based
 Physical
Record-based Data Models
Consist of a number of fixed format records.
Each record type defines a fixed number of fields,
Each field is typically of a fixed length.
 Hierarchical Data Model
 Network Data Model
 Relational Data Model

23
Database Systems Lecture Note

[Link] Model
 The simplest data model
 Record type is referred to as node or segment
 The top node is the root node
 Nodes are arranged in a hierarchical structure as sort of
upside-down tree
 A parent node can have more than one child node
 A child node can only have one parent node
 The relationship between parent and child is one-to-
many
 Relation is established by creating physical link between
stored records (each is stored with a predefined access
path to other records)
 To add new record type or relationship, the database
must be redefined and then stored in a new form.

Department

Employee Job

Time Card Activity

ADVANTAGES of Hierarchical Data Model:

 Hierarchical Model is simple to construct and operate on
 Corresponds to a number of natural hierarchically organized
domains - e.g., assemblies in manufacturing, personnel
organization in companies
 Language is simple; uses constructs like GET, GET UNIQUE,
GET NEXT, GET NEXT WITHIN PARENT etc.

DISADVANTAGES of Hierarchical Data Model:

 Navigational and procedural nature of processing
 Database is visualized as a linear arrangement of records
 Little scope for "query optimization"

[Link] Model

24
Database Systems Lecture Note

 Allows record types to have more than one parent unlike

hierarchical model
 A network data models sees records as set members
 Each set has an owner and one or more members
 Allow no many to many relationship between entities
 Like hierarchical model network model is a collection of
physically linked records.
 Allow member records to have more than one owner

Department Job

Employee
Activity

Time Card

ADVANTAGES of Network Data Model:

 Network Model is able to model complex relationships and
represents semantics of add/delete on the relationships.
 Can handle most situations for modeling using record types
and relationship types.
 Language is navigational; uses constructs like FIND, FIND
member, FIND owner, FIND NEXT within set, GET etc.
Programmers can do optimal navigation through the
database.

DISADVANTAGES of Network Data Model:

 Navigational and procedural nature of processing
 Database contains a complex array of pointers that thread
through a set of records.
 Little scope for automated "query optimization”

25
Database Systems Lecture Note

[Link] Data Model

 Developed by Dr. Edgar Frank Codd in 1970 (famous
paper, 'A Relational Model for Large Shared Data Banks')
 Terminologies originates from the branch of
mathematics called set theory and predicate logic and is
based on the mathematical concept called Relation
 Can define more flexible and complex relationship
 Viewed as a collection of tables called “Relations”
equivalent to collection of record types
 Relation: Two dimensional table
 Stores information or data in the form of tables  rows
and columns
 A row of the table is called tuple equivalent to record
 A column of a table is called attribute equivalent to
fields
 Data value is the value of the Attribute
 Records are related by the data stored jointly in the
fields of records in two tables or files. The related tables
contain information that creates the relation
 The tables seem to be independent but are related
some how.
 No physical consideration of the storage is required by
the user
 Many tables are merged together to come up with a
new virtual view of the relationship

Alternative terminologies
Relation Table File
Tuple Row Record
Attribute Column Field

 The rows represent records (collections of information about

separate items)
 The columns represent fields (particular attributes of a
record)
 Conducts searches by using data in specified columns of
one table to find additional data in another table
 In conducting searches, a relational database matches
information from a field in one table with information in a
corresponding field of another table to produce a third table
that combines requested data from both tables

26
Database Systems Lecture Note

Chapter Two

Relational Data Model

Important terms:
Relation: a table with rows and columns
Attribute: a named column of a relation
Domain: a set of allowable values for one or more attributes
Tuple: a row of a relation
Degree: the degree of a relation is the number of attributes it
contains
Unary relation, Binary relation, Ternary relation, N-ary relation
Cardinality: of a relation is the number of tuples the relation has
Relational Database: a collection of normalized relations with
distinct relation names.
Relation Schema: a named relation defined by a set of attribute-
domain name pair
Let A1, A2...........An be attributes with domain D1, D2 ………,Dn.
Then the sets {A1:D1, A2:D2… An:Dn} is a Relation Schema. A
relation R, defined by a relation schema S, is a set of mappings from
attribute names to their corresponding domains. Thus a relation is a
set of n- tuples of the form
(A1:d1, A2:d2 ,…, An:dn) where d1 є D1, d2 є D2,…….. dn є Dn,
Eg.
Student (studentId char(10), studentName char(50), DOB date) is a
relation schema for the student entity in SQL

Relational Database schema: a set of relation schema each with

distinct names.
Suppose R1, R2,……, Rn is the set of relation schema in a relational
database then the relational database schema (R) can be stated as
R={ R1 , R2 ,……., Rn}

27
Database Systems Lecture Note

Properties of Relational Databases

 A relation has a name that is distinct from all other relation

names in the relational schema.
 Each tuple in a relation must be unique
 All tables are LOGICAL ENTITIES
 Each cell of a relation contains exactly one atomic (single)
value.
 Each column (field or attribute) has a distinct name.
 The values of an attribute are all from the same domain.
 A table is either a BASE TABLES (Named Relations) or VIEWS
(Unnamed Relations)
 Only Base Tables are physically stored
 VIEWS are derived from BASE TABLES with SQL statements
like: [SELECT .. FROM .. WHERE .. ORDER BY]
 Relational database is the collection of tables
o Each entity in one table
o Attributes are fields (columns) in table
 Order of rows theoretically ( but practically has impact on
performance) and columns is immaterial
 Entries with repeating groups are said to be un-normalized

All values in a column represent the same attribute and have the
same data format

28
Database Systems Lecture Note

Building Blocks of the Relational Data Model

The building blocks of the relational data model are:

 Entities: real world physical or logical object

 Attributes: properties used to describe each Entity or real world
object.
 Relationship: the association between Entities
 Constraints: rules that should be obeyed while manipulating the
data.

1. The ENTITIES (persons, places, things etc.) which the

organization has to deal with. Relations can also describe
relationships

The name given to an entity should always be a singular noun

descriptive of each item to be stored in it. E.g. : student NOT
students.

Every relation has a schema, which describes the columns, or

fields the relation itself corresponds to our familiar notion of a
table:
A relation is a collection of tuples, each of which contains values
for a fixed number of attributes
 Existence Dependency: the dependence of an entity on the
existence of one or more entities.
 Weak entity : an entity that can not exist without the entity
with which it has a relationship – it is indicated by a double
rectangle

2. The ATTRIBUTES - the items of information which characterize

and describe these entities.

Attributes are pieces of information ABOUT entities. The analysis

must of course identify those which are actually relevant to the
proposed application. Attributes will give rise to recorded items
of data in the database

At this level we need to know such things as:

 Attribute name (be explanatory words or phrases)

 The domain from which attribute values are taken (A
DOMAIN is a set of values from which attribute values
may be taken.) Each attribute has values taken from a

29
Database Systems Lecture Note

domain. For example, the domain of Name is string and

that for salary is real. How ever these are not shown on
E-R models
 Whether the attribute is part of the entity identifier
(attributes which just describe an entity and those
which help to identify it uniquely)
 Whether it is permanent or time-varying (which
attributes may change their values over time)
 Whether it is required or optional for the entity
(whose values will sometimes be unknown or irrelevant)

Types of Attributes

(1)Simple (atomic) Vs Composite attributes

 Simple : contains a single value (not divided into sub
parts)
E.g. Age, gender
 Composite: Divided into sub parts (composed of
other attributes)
E.g. Name, address

(2)Single-valued Vs multi-valued attributes

 Single-valued : have only single value(the value
may change but has only one value at one time)
E.g. Name, Sex, Id. No. color_of_eyes
 Multi-Valued: have more than one value
E.g. Address, dependent-name
Person may have several college degrees

(3)Stored vs. Derived Attribute

 Stored : not possible to derive or compute
E.g. Name, Address
 Derived: The value may be derived (computed) from
the values of other attributes.
E.g. Age (current year – year of birth)
Length of employment (current date- start
date)
Profit (earning-cost)
G.P.A (grade point/credit hours)
(4)Null Values
 NULL applies to attributes which are not applicable or
which do not have values.
 You may enter the value NA (meaning not applicable)
 Value of a key attribute can not be null.

30
Database Systems Lecture Note

Default value - assumed value if no explicit value

Entity versus Attributes

When designing the conceptual specification of the database, one
should pay attention to the distinction between an Entity and an
Attribute.
 Consider designing a database of employees for an
organization:
 Should address be an attribute of Employees or an entity
(connected to Employees by a relationship)?
 If we have several addresses per employee,
address must be an entity (attributes cannot be
set-valued/multi valued)
 If the structure (city, Woreda, Kebele, etc) is important, e.g.
want to retrieve employees in a given city, address must be
modeled as an entity (attribute values are atomic)

31
Database Systems Lecture Note

3. The RELATIONSHIPS between entities which exist and must be

taken into account when processing information. In any business
processing one object may be associated with another object due to
some event. Such kind of association is what we call a
RELATIONSHIP between entity objects.
 One external event or process may affect several related
entities.
 Related entities require setting of LINKS from one part of
the database to another.
 A relationship should be named by a word or phrase which
explains its function
 Role names are different from the names of entities
forming the relationship: one entity may take on many
roles, the same role may be played by different entities
 For each RELATIONSHIP, one can talk about the Number of
Entities and the Number of Tuples participating in the
association. These two concepts are called DEGREE and
CARDINALITY of a relationship respectively.

Degree of a Relationship
 An important point about a relationship is how many
entities participate in it. The number of entities
participating in a relationship is called the DEGREE of the
relationship.

Among the Degrees of relationship, the following are the

basic:
O UNARY/RECURSIVE RELATIONSHIP:
Tuples/records of a Single entity are related withy
each other.
O BINARY RELATIONSHIPS: Tuples/records of two
entities are associated in a relationship
O TERNARY RELATIONSHIP: Tuples/records of three
different entities are associated
o And a generalized one:
 N-ARY RELATIONSHIP: Tuples from arbitrary
number of entity sets are participating in a
relationship.

32
Database Systems Lecture Note

Cardinality of a Relationship
 Another important concept about relationship is the
number of instances/tuples that can be associated with a
single instance from one entity in a single relationship. The
number of instances participating or associated with a
single instance from an entity in a relationship is called the
CARDINALITY of the relationship. The major cardinalities
of a relationship are:
o ONE-TO-ONE: one tuple is associated with only one
other tuple.
 E.g. Building – Location as a single building
will be located in a single location and as a
single location will only accommodate a single
Building.
o ONE-TO-MANY, one tuple can be associated with
many other tuples, but not the reverse.
 E.g. Department-Student as one department
can have multiple students.
o MANY-TO-ONE, many tuples are associated with one
tuple but not the reverse.
 E.g. Employee – Department: as many
employees belong to a single department.
o MANY-TO-MANY: one tuple is associated with many
other tuples and from the other side, with a different
role name one tuple will be associated with many
tuples

 E.g. Student – Courseas a student can take

many courses and a single course can be
attended by many students.

However, the degree and cardinality of a relation are

different from degree and cardinality of a relationship.

33
Database Systems Lecture Note

 Key constraints
If tuples are need to be unique in the database, and then we
need to make each tuple distinct. To do this we need to have
relational keys that uniquely identify each record.

Super Key: an attribute or set of attributes that uniquely

identifies a tuple within a relation.
Candidate Key: a super key such that no proper subset of that
collection is a Super Key within the relation.
A candidate key has two properties:
1. Uniqueness
2. Irreducibility
If a super key is having only one attribute, it is
automatically a Candidate key.
If a candidate key consists of more than one attribute
it is called Composite Key.
Primary Key: the candidate key that is selected to identify
tuples uniquely within the relation.
The entire set of attributes in a relation can be
considered as a primary case in a worst case.
Foreign Key: an attribute, or set of attributes, within one
relation that matches the candidate key of some
relation.
A foreign key is a link between different relations to create a
view or an unnamed relation

Relational Constraints/Integrity Rules

 Relational Integrity
 Domain Integrity: No value of the attribute
should be beyond the allowable limits
 Entity Integrity: In a base relation, no attribute
of a Primary Key can assume a value of NULL
 Referential Integrity: If a Foreign Key exists in
a relation, either the Foreign Key value must
match a Candidate Key value in its home
relation or the Foreign Key value must be
NULL
 Enterprise Integrity: Additional rules specified
by the users or database administrators of a
database are incorporated

34
Database Systems Lecture Note

 Relational Views
Relations are perceived as a Table from the users’ perspective.
Actually, there are two kinds of relation in relational database.
The two categories or types of Relations are Named and
Unnamed Relations. The basic difference is on how the relation is
created, used and updated:
1. Base Relation
A Named Relation corresponding to an entity in the
conceptual schema, whose tuples are physically stored in
the database.
2. View (Unnamed Relation)
A View is the dynamic result of one or more relational
operations operating on the base relations to produce
another virtual relation that does not actually exist as
presented. So a view is virtually derived relation that
does not necessarily exist in the database but can be
produced upon request by a particular user at the time of
request. The virtual table or relation can be created from
single or different relations by extracting some attributes
and records with or without conditions.

Purpose of a view
 Hides unnecessary information from users: since only
part of the base relation (Some collection of attributes,
not necessarily all) are to be included in the virtual
table.
 Provide powerful flexibility and security: since
unnecessary information will be hidden from the user
there will be some sort of data security.
 Provide customized view of the database for users:
each user is going to be interfaced with their own
preferred data set and format by making use of the
Views.
 A view of one base relation can be updated.
 Update on views derived from various relations is not
allowed since it may violate the integrity of the
database.
 Update on view with aggregation and summary is not
allowed. Since aggregation and summary results are

35
Database Systems Lecture Note

computed from a base relation and does not exist

actually.

36
Database Systems Lecture Note

Schemas and Instances and

Database State
When a database is designed using a Relational data model, all the
data is represented in a form of a table. In such definitions and
representation, there are two basic components of the database. The
two components are the definition of the Relation or the Table and the
actual data stored in each table. The data definition is what we call the
Schema or the skeleton of the database and the Relations with some
information at some point in time is the Instance or the flesh of the
database.

Schemas
 Schema describes how data is to be structured, defined at
setup/Design time (also called "metadata")
 Since it is used during the database development phase, there is
rare tendency of changing the schema unless there is a need for
system maintenance which demands change to the definition of
a relation.

 Database Schema (Intension): specifies name of relation and

the collection of the attributes (specifically the Name of
attributes).
 refer to a description of database (or intention)
 specified during database design
 should not be changed unless during maintenance

 Schema Diagrams
 convention to display some aspect of a schema visually

 Schema Construct
 refers to each object in the schema (e.g. STUDENT)
E.g.: STUNEDT (FName,LName,Id,Year,Dept, Sex)

37
Database Systems Lecture Note

Instances

 Instance: is the collection of data in the database at a

particular point of time (snap-shot).
 Also called State or Snap Shot or Extension of the
database
 Refers to the actual data in the database at a specific point
in time
 State of database is changed any time we add, delete or
update an item.
 Valid state: the state that satisfies the structure and
constraints specified in the schema and is enforced by
DBMS

 Since Instance is actual data of database at some point in time,

changes rapidly
 To define a new database, we specify its database schema to the
DBMS (database is empty)
 database is initialized when we first load it with data

38
Database Systems Lecture Note

Chapter Three

Database Design
Database design is the process of coming up with different kinds
of specification for the data to be stored in the database. The
database design part is one of the middle phases we have in
information systems development where the system uses a
database approach. Design is the part on which we would be
engaged to describe how the data should be perceived at
different levels and finally how it is going to be stored in a
computer system.

Information System with Database application

consists of several tasks which include:

 Planning of Information systems Design

 Requirements Analysis,
 Design (Conceptual, Logical and Physical
Design)
 Implementation
 Testing and deployment
 Operation and Support

From these different phases, the prime interest of a database

system will be the Design part which is again sub divided into
other three sub-phases. These sub-phases are:
[Link] Design
[Link] Design, and
[Link] Design

 In general, one has to go back and forth between these

tasks to refine a database design, and decisions in one task
can influence the choices in another task.
 In developing a good design, one should answer such
questions as:
 What are the relevant Entities for the
Organization
 What are the important features of each Entity

39
Database Systems Lecture Note

 What are the important Relationships

 What are the important queries from the user
 What are the other requirements of the
Organization and the Users

The Three levels of Database

Design
Conceptual Design

Logical Design

Physical Design

Conceptual Database Design

 Conceptual design is the process of constructing a model of the
information used in an enterprise, independent of any
physical considerations.
 It is the source of information for the logical design phase.
 Mostly uses an Entity Relationship Model to describe the
data at this level.
 After the completion of Conceptual Design one has to go for
refinement of the schema, which is verification of Entities,
Attributes, and Relationships

Logical Database Design

 Logical design is the process of constructing a model of the
information used in an enterprise based on a specific data model
(e.g. relational, hierarchical or network or object), but
independent of a particular DBMS and other physical
considerations.
 Normalization process
 Collection of Rules to be maintained
 Discover new entities in the process
 Revise attributes based on the rules and the
discovered Entities

Physical Database Design

 Physical design is the process of producing a description of the
implementation of the database on secondary storage. -- defines
specific storage or access methods used by database

40
Database Systems Lecture Note

 Describes the storage structures and access methods used

to achieve efficient access to the data.
 Tailored to a specific DBMS system -- Characteristics are
function of DBMS and operating systems
 Includes estimate of storage space

41
Database Systems Lecture Note

Conceptual Database Design

 Conceptual design revolves around discovering and analyzing
organizational and user data requirements
 The important activities are to identify
 Entities
 Attributes
 Relationships
 Constraints
 And based on these components develop the ER model using
 ER diagrams

The Entity Relationship (E-R)

Model
 Entity-Relationship modeling is used to represent conceptual
view of the database
 The main components of ER Modeling are:
o Entities
 Corresponds to entire table, not row
 Represented by Rectangle
o Attributes
 Represents the property used to describe an
entity or a relationship
 Represented by Oval
o Relationships
 Represents the association that exist between
entities
 Represented by Diamond
o Constraints
 Represent the constraint in the data
 Cardinality and Participation Constraints

Before working on the conceptual design of the

database, one has to know and answer the
following basic questions.
 What are the entities and relationships in the enterprise?
 What information about these entities and relationships
should we store in the database?

42
Database Systems Lecture Note

 What are the integrity constraints that hold? Constraints

on each data with respect to update, retrieval and store.
 Represent this information pictorially in ER diagrams, then
map ER diagram into a relational schema.

43
Database Systems Lecture Note

Developing an E-R Diagram

 Designing conceptual model for the database is not a one

linear process but an iterative activity where the design is refined
again and again.
 To identify the entities, attributes, relationships, and
constraints on the data, there are different set of methods used
during the analysis phase. These include information gathered
by…
 Interviewing end users individually and in a
group
 Questionnaire survey
 Direct observation
 Examining different documents
 Analysis of requirements gathered
 Nouns -- prospective entities
 Adjectives--prospective attributes
 Verbs/verb phrases-prospective
relationships

 The basic E-R model is graphically depicted and presented for

review.
 The process is repeated until the end users and designers
agree that the E-R diagram is a fair representation of the
organization’s activities and functions.
 Checking for Redundant Relationships in the ER Diagram.
Relationships between entities indicate access from one entity to
another - it is therefore possible to access one entity occurrence
from another entity occurrence even if there are other entities
and relationships that separate them - this is often referred to as
Navigation' of the ER diagram
 The last phase in ER modeling is validating an ER Model
against requirement of the user.

44
Database Systems Lecture Note

Graphical Representations in ER
Diagramming

 Entity is represented by a RECTANGLE

containing the name of the entity.
StrongWeak
EntityEntity

 Connected entities are called relationship

participants

 Attributes are represented by OVALS and are

connected to the entity by a line. Ov
als
Ov
Ovals Ovals Ovals
als
Ov
Multi-valued Composite als
Attribute Attribute Attribute
 A derived attribute is indicated by a DOTTED
LINE. (……..)
Ovals

 PRIMARY KEYS are underlined.

Key

 Relationships are represented by DIAMOND

shaped symbols
 Weak Relationship is a relationship between Weak and
Strong Entities
 Strong Relationship is a relationship between two strong
Entities

Diamond
Diamond

45
Database Systems Lecture Note

Strong Relationship Weak Relationship

46
Database Systems Lecture Note

Example 1: Build an ER Diagram for the following

information:
 A student record management system will have the
following two basic data object categories with their own
features or properties: Students will have an Id, Name,
Dept, Age, GPA and Course will have an Id, Name, Credit
Hours
 Whenever a student enroll in a course in a specific
Academic Year and Semester, the Student will have a
grade for the course

Name Dept DoB Id Name Credit

Id Gpa
Students Course
s

Age

Enrolled_In Semester
Academic
Year

Grade

Example 2: Build an ER Diagram for the following

information:
 A Personnel record management system will have the
following two basic data object categories with their own
features or properties: Employee will have an Id, Name,
DoB, Age, Tel and Department will have an Id, Name,
Location
 Whenever an Employee is assigned in one
Department, the duration of his stay in the respective
department should be registered.

47
Database Systems Lecture Note

48
Database Systems Lecture Note

Structural Constraints on
Relationship
1. Constraints on Relationship / Multiplicity/ Cardinality
Constraints
 Multiplicity constraint is the number or range of possible
occurrence of an entity type/relation that may relate to a single
occurrence/tuple of an entity type/relation through a particular
relationship.
 Mostly used to insure appropriate enterprise constraints.

One-to-one relationship:
 A customer is associated with at most one loan via the
relationship borrower
 A loan is associated with at most one customer via
borrower

E.g.: Relationship Manages between STAFF and BRANCH

The multiplicity of the relationship is:
 One branch can only have one manager
 One employee could manage either one or no branches

1..1 Manages 0..1

Employee Branch

49
Database Systems Lecture Note

One-To-Many Relationships
 In the one-to-many relationship a loan is associated with at
most one customer via borrower, a customer is associated with several
(including 0) loans via borrower

E.g.: Relationship Leads between STAFF and PROJECT

The multiplicity of the relationship
 One staff may Lead one or more project(s)
 One project is Lead by one staff

1..1 Leads 0..*

Employee Project

Many-To-Many Relationship
 A customer is associated with several (possibly 0) loans via
borrower
 A loan is associated with several (possibly 0) customers via
borrower

E.g.: Relationship “Teaches” between INSTRUCTOR and COURSE

The multiplicity of the relationship
 One Instructor Teaches one or more Course(s)
 One Course Thought by Zero or more Instructor(s)

50
Database Systems Lecture Note

0..* Teaches 1..*

Instructor Course

51
Database Systems Lecture Note

Participation of an Entity Set in a

Relationship Set
Participation constraint of a relationship is involved in identifying
and setting the mandatory or optional feature of an entity
occurrence to take a role in a relationship. There are two distinct
participation constraints with this respect, namely: Total
Participation and Partial Participation

 Total participation: every tuple in the entity or

relation participates in at least one relationship by taking a role.
This means, every tuple in a relation will be attached with at least
one other tuple. The entity with total participation in a
relationship will be connected to the relationship using a double
line.
 Partial participation: some tuple in the entity or
relation may not participate in the relationship. This means, there
is at least one tuple from that Relation not taking any role in that
specific relationship. The entity with partial participation in a
relationship will be connected to the relationship using a single
line.

 E.g. 1:Participation of EMPLOYEE in “belongs to” relationship with

DEPARTMENT is total since every employee should belong to
a department.
Participation of DEPARTMENT in “belongs to” relationship
with EMPLOYEE is total since every department should have
more than one employee.
1..* 1..1
Employee BelongsTo Department

 E.g. 2: Participation of EMPLOYEE in “manages” relationship with

DEPARTMENT, is partial participation since not all employees
are managers.
Participation of DEPARTMENT in “Manages” relationship with
EMPLOYEE is total since every department should have a
manager.

52
Database Systems Lecture Note

1..1 0..1
Employee Manages Department

Problem in ER Modeling
The Entity-Relationship Model is a conceptual data model that views
the real world as consisting of entities and relationships. The model
visually represents these concepts by the Entity-Relationship diagram.
The basic constructs of the ER model are entities, relationships, and
attributes. Entities are concepts, real or abstract, about which
information is collected. Relationships are associations between the
entities. Attributes are properties which describe the entities.

While designing the ER model one could face a problem on the design
which is called a connection traps. Connection traps are problems
arising from misinterpreting certain relationships
There are two types of connection traps;
1. Fan trap:
Occurs where a model represents a relationship between entity
types, but the pathway between certain entity occurrences is
ambiguous.
May exist where two or more one-to-many (1:M) relationships fan
out from an entity. The problem could be avoided by
restructuring the model so that there would be no 1:M
relationships fanning out from a singe entity and all the
semantics of the relationship is preserved.

Example:

1..* Works 1..1 1..1 IsAssigned 1..*

EMPLOYEE For BRANCH CAR

Semantics description of the problem;

Emp1 Br1 Car1

Emp2 Br2 Car2
Emp3 Br3 Car3
Emp4 Br4 Car4
Emp5 Car5
53
Emp6 Car6
Emp7 Car7
Database Systems Lecture Note

Problem: Which car (Car1 or Car3 or Car5) is used by Employee 6

Emp6 working in Branch 1 (Br1)? Thus from this ER Model one can not
tell which car is used by which staff since a branch can have more than
one car and also a branch is populated by more than one employee.
Thus we need to restructure the model to avoid the connection trap.

To avoid the Fan Trap problem we can go for restructuring of the E-R
Model. This will result in the following E-R Model.

1..1 Has 1..* 1..* Used By 1..*

BRANCH CAR EMPLOYEE

Semantics description of the problem;

Car1
Br1 Emp1
Car2
Br2 Emp2
Car3
Br3 Emp3
Car4
Br4 Emp4
Car5
Emp5
Car6
Emp6
Car7
Emp7

54
Database Systems Lecture Note

2. Chasm Trap:
Occurs where a model suggests the existence of a relationship
between entity types, but the path way does not exist between
certain entity occurrences.
Chasm trap may exist when there are one or more relationships
with a minimum multiplicity on cardinality of zero forming part of
the pathway between related entities.

Example:

1..1 Has 1..*

BRANCH EMPLOYEE 0..1 Manages 0..* PROJECT

If we have a set of projects that are not active currently then we

can not assign a project manager for these projects. So there are
project with no project manager making the participation to have
a minimum value of zero.

Problem:
How can we identify which BRANCH is responsible for which
PROJECT? We know that whether the PROJECT is active or not
there is a responsible BRANCH. But which branch is a question to
be answered, and since we have a minimum participation of zero
between employee and PROJECT we can’t identify the BRANCH
responsible for each PROJECT.

The solution for this Chasm Trap problem is to add another

relation ship between the extreme entities (BRANCH and
PROJECT)

1..1 Has 1..*

BRANCH EMPLOYEE 0..1 Manages 0..* PROJECT

1..1 Responsible for 1..*

55
Database Systems Lecture Note

Enhanced E-R (EER)

Models
 Object-oriented extensions to E-R model
 EER is important when we have a relationship between two
entities and the participation is partial between entity
occurrences. In such cases EER is used to reduce the
complexity in participation and relationship complexity.
 ER diagrams consider entity types to be primitive objects
 EER diagrams allow refinements within the structures of entity
types

 EER Concepts
 Generalization
 Specialization
 Sub classes
 Super classes
 Attribute Inheritance
 Constraints on specialization and generalization

56
Database Systems Lecture Note

 Generalization
 Generalization occurs when two or more entities represent
categories of the same real-world object.
 Generalization is the process of defining a more general entity
type from a set of more specialized entity types.
 A generalization hierarchy is a form of abstraction that
specifies that two or more entities that share common
attributes can be generalized into a higher level entity type.
 Is considered as bottom-up definition of entities.
 Generalization hierarchy depicts relationship between higher
level superclass and lower level subclass.
Generalization hierarchies can be nested. That is, a subtype of
one hierarchy can be a supertype of another. The level of nesting
is limited only by the constraint of simplicity.
Example: Account is a generalized form for aving and
Current Accounts

57
Database Systems Lecture Note

 Specialization
 Is the result of subset of a higher level entity set to form a
lower level entity set.
 The specialized entities will have additional set of attributes
(distinguishing characteristics) that distinguish them from the
generalized entity.
 Is considered as Top-Down definition of entities.
 Specialization process is the inverse of the Generalization
process. Identify the distinguishing features of some entity
occurrences, and specialize them into different subclasses.
 Reasons for Specialization
o Attributes only partially applying to superclasses
o Relationship types only partially applicable to the
superclass
 In many cases, an entity type has numerous sub-groupings of
its entities that are meaningful and need to be represented
explicitly. This need requires the representation of each
subgroup in the ER model. The generalized entity is a
superclass and the set of specialized entities will be
subclasses for that specific Superclass.
o Example: Saving Accounts and Current Accounts are
Specialized entities for the generalized entity Accounts.
Manager, Sales, Secretary: are specialized employees.

 Subclass/Subtype
 An entity type whose tuples have attributes that distinguish
its members from tuples of the generalized or Superclass
entities.
 When one generalized Superclass has various subgroups with
distinguishing features and these subgroups are represented
by specialized form, the groups are called subclasses.
 Subclasses can be either mutually exclusive (disjoint) or
overlapping (inclusive).
 A single subclass may inherit attributes from two distinct
superclasses.
 A mutually exclusive category/subclass is when an entity
instance can be in only one of the subclasses.
E.g.: An EMPLOYEE can either be SALARIED or PART-TIMER
but not both.
 An overlapping category/subclass is when an entity instance
may be in two or more subclasses.
E.g.: A PERSON who works for a university can be
both EMPLOYEE and a STUDENT at the same
time.

58
Database Systems Lecture Note

 Superclass /Supertype
 An entity type whose tuples share common attributes.
Attributes that are shared by all entity occurrences (including
the identifier) are associated with the supertype.
 Is the generalized entity

 Relationship Between Superclass and

Subclass
 The relationship between a superclass and any of its
subclasses is called a superclass/subclass or
class/subclass relationship
 An instance can not only be a member of a subclass.
i.e. Every instance of a subclass is also an instance in
the Superclass.
 A member of a subclass is represented as a distinct
database object, a distinct record that is related via the
key attribute to its super-class entity.
 An entity cannot exist in the database merely by being
a member of a subclass; it must also be a member of
the super-class.
 An entity occurrence of a sub class not necessarily
should belong to any of the subclasses unless there is
full participation in the specialization.
 The relationship between a subclass and a Superclass is
an “IS A” or “IS PART OF” type.
 Subclass IS PART OF Superclass
 Manager IS AN Employee
 All subclasses or specialized entity sets should be
connected with the superclass using a line to a circle
where there is a subset symbol indicating the direction
of subclass/superclass relationship.

59
Database Systems Lecture Note

 We can also have subclasses of a subclass forming a

hierarchy of specialization.
 Superclass attributes are shared by all subclasses of
that superclass
 Subclass attributes are unique for the subclass.

 Attribute Inheritance
 An entity that is a member of a subclass inherits all the
attributes of the entity as a member of the superclass.
 The entity also inherits all the relationships in which the
superclass participates.
 An entity may have more than one subclass categories.
 All entities/subclasses of a generalized entity or
superclass share a common unique identifier attribute
(primary key). i.e. The primary key of the superclass
and subclasses are always identical.

 Consider the EMPLOYEE supertype entity shown above.

This entity can have several different subtype entities (for
example: HOURLY and SALARIED), each with distinct
properties not shared by other subtypes. But whether the
employee is HOURLY or SALARIED, same attributes
(EmployeeId, Name, and DateHired) are shared.
 The Supertype EMPLOYEE stores all properties that
subclasses have in common. And HOURLY employees have
the unique attribute Wage (hourly wage rate), while
SALARIED employees have two unique attributes,
StockOption and Salary.

60
Database Systems Lecture Note

Constraints on specialization and

generalization
 Completeness Constraint.
 The Completeness Constraint addresses the issue of whether
or not an occurrence of a Superclass must also have a
corresponding Subclass occurrence.
 The completeness constraint requires that all instances of the
subtype be represented in the supertype.
 The Total Specialization Rule specifies that an entity
occurrence should at least be a member of one of the
subclasses. Total Participation of superclass instances on
subclasses is diagrammed with a double line from the
Supertype to the circle as shown below.

E.g.: If we have EXTENTION and REGULAR as subclasses of a

superclass STUDENT, then it is mandatory that each
student to be either EXTENTION or REGULAR student.
Thus the participation of instances of STUDENT in
EXTENTION and REGULAR subclasses will be total.

 The Partial Specialization Rule specifies that it is not

necessary for all entity occurrences in the superclass to be a
member of one of the subclasses. Here we have an optional
participation on the specialization. Partial Participation of
superclass instances on subclasses is diagrammed with a single
line from the Supertype to the circle.

E.g.: If we have MANAGER and SECRETARY as subclasses of a

superclass EMPLOYEE, then it is not the case that all
employees are either manager or secretary. Thus the
participation of instances of employee in MANAGER and
SECRETARY subclasses will be partial.

61
Database Systems Lecture Note

 Disjointness Constraints.

 Specifies the rule whether one entity occurrence can be a

member of more than one subclasses. i.e. it is a type of
business rule that deals with the situation where an entity
occurrence of a Superclass may also have more than one
Subclass occurrence.
 The Disjoint Rule restricts one entity occurrence of a
superclass to be a member of only one of the subclasses.
Example: a EMPLOYEE can either be SALARIED or PART-
TIMER, but not the both at the same time.
 The Overlap Rule allows one entity occurrence to be a
member f more than one subclass. Example: EMPLOYEE
working at the university can be both a STUDENT and an
EMPLOYEE at the same time.
 This is diagrammed by placing either the letter "d" for
disjoint or "o" for overlapping inside the circle on the
Generalization Hierarchy portion of the E-R diagram.

The two types of constraints on generalization and specialization

(Disjointness and Completeness constraints) are not dependent
on one another. That is, being disjoint will not favour whether the
tuples in the superclass should have Total or Partial participation
for that specific specialization.

From the two types of constraints we can have four possible

constraints

 Disjoint AND Total

 Disjoint AND Partial

 Overlapping AND Total

 Overlapping AND Partial

62
Database Systems Lecture Note

Chapter Four

Logical Database Design

The whole purpose of the data base design is to create an accurate
representation of the data, the relationship between the data and the
business constraints pertinent to that organization. Therefore, one can
use one or more technique to design a data base. One such a
technique was the E-R model. In this chapter we use another technique
known as “Normalization” with a different emphasis to the database
design---- defines the structure of a database with a specific data
model.

Logical design is the process of constructing a model of the information

used in an enterprise based on a specific data model (e.g. relational,
hierarchical or network or object), but independent of a particular
DBMS and other physical considerations.

The focus in logical database design is the Normalization Process

 Normalization process
 Collection of Rules (Tests) to be applied on relations
to obtain the minimal, non redundant set or
attributes.
 Discover new entities in the process
 Revise attributes based on the rules and the
discovered Entities
 Works by examining the relationship between
attributes known as functional dependency.

The purpose of normalization is to find the suitable set of relations that

supports the data requirements of an enterprise.
A suitable set of relations has the following characteristics;

 Minimal number of attributes to support the data requirements

of the enterprise
 Attributes with close logical relationship (functional dependency)
should be placed in the same relation.
 Minimal redundancy with each attribute represented only once
with the exception of the attributes which form the whole or part
of the foreign key, which are used for joining of related tables.

63
Database Systems Lecture Note

The first step before applying the rules in relational data model is
converting the conceptual design to a form suitable for relational
logical model, which is in a form of tables.

Converting ER Diagram to Relational Tables

Three basic rules to convert ER into tables or relations:
Rule 1: Entity Names will automatically be table names
Rule 2: Mapping of attributes: attributes will be columns of the
respective tables.
 Atomic or single-valued or derived or stored attributes will be
columns
 Composite attributes: the parent attribute will be ignored and
the decomposed attributes (child attributes) will be columns
of the table.
 Multi-valued attributes: will be mapped to a new table where
the primary key of the main table will be posted for cross
referencing.

Rule 3: Relationships: relationship will be mapped by using a foreign

key attribute. Foreign key is a primary or candidate key of one relation
used to create association between tables.

 For a relationship with One-to-One Cardinality: post the

primary or candidate key of one of the table into the other as
a foreign key. In cases where one entity is having partial
participation on the relationship, it is recommended to post
the candidate key of the partial participants to the total
participant so as to save some memory location due to null
values on the foreign key attribute. E.g.: for a relationship
between Employee and Department where employee
manages a department, the cardinality is one-to-one as one
employee will manage only one department and one
department will have one manager. here the PK of the
Employee can be posted to the Department or the PK of the
Department can be posted to the Employee. But the
Employee is having partial participation on the relationship
"Manages" as not all employees are managers of
departments. thus, even though both way is possible, it is
recommended to post the primary key of the employee to the
Department table as a foreign key.

 For a relationship with One-to-Many Cardinality: Post

the primary key or candidate key from the “one” side as a
foreign key attribute to the “many” side. E.g.: For a
relationship called “Belongs To” between Employee (Many)

64
Database Systems Lecture Note

and Department (One) the primary or candidate key of the

one side which is Department should be posted to the many
side which is Employee table.

 For a relationship with Many-to-Many Cardinality: for

relationships having many to many cardinality, one has to
create a new table (which is the associative entity) and post
primary key or candidate key from the participant entities as
foreign key attributes in the new table along with some
additional attributes (if applicable). The same approach
should be used for relationships with degree greater than
binary.

 For a relationship having Associative Entity property:

in cases where the relationship has its own attributes
(associative entity), one has to create a new table for the
associative entity and post primary key or candidate key from
the participating entities as foreign key attributes in the new
table.

65
Database Systems Lecture Note

Example to illustrate the major rules in mapping ER to relational

schema:

The following ER has been designed to represent the requirement of an

organization to capture Employee Department and Project information.
And Employee works for department where an employee might be
assigned to manage a department. Employees might participate on
different projects within the organization. An employee might as well
be assigned to lead a project where the starting and ending date of
his/her project leadership and bonus will be registered.

FName LName
e e

EI Salar DI DLoc
D Nam y Manag D
e es
1 1
Employee Department

M 1 M WorksFo 1
r
Tel DNam
e

StartDate
Leads
EndDate
Participat
e
PBonu
s

M
M
Project

PFund
PID PName

66
Database Systems Lecture Note

After we have drawn the ER diagram, the next thing is to map the ER
into relational schema so as the rules of the relational data model can
be tested for each relational schema. The mapping can be done for the
entities followed by relationships based on the rule of mapping. the
mapping has been done as follows.

 Mapping EMPLOYEE Entity:

There will be Employee table with EID, Salary, FName and
LName being the columns. The composite attribute Name will be
ignored as its decomposed attributes (FName and LName) are
columns in the Employee Table. The Tel attribute will be a new
table as it is multi-valued.
Employee
EID FName LName Salary
Telephone
EID Tel
 Mapping DEPARTMENT Entity:
There will be Department table with DID, DName, and DLoc
being the columns.
Department
DID DName DLoc

 Mapping PROJECT Entity:

There will be Project table with PID, PName, and PFund being
the columns.
Project
PID PName PFund

 Mapping the MANAGES Relationship:

As the relationship is having one-to-one cardinality, the PK or CK
of one of the table can be posted into the other. But based on
the recommendation, the Pk or CK of the partial participant
(Employee) should be posted to the total participants
(Department). This will require adding the PK of Employee (EID)
in the Department Table as a foreign key. We can give the
foreign key another name which is MEID to mean "managers
employee id". this will affect the degree of the Department table.
Department
DID DName DLoc MEID

 Mapping the WORKSFOR Relationship:

67
Database Systems Lecture Note

As the relationship is having one-to-many cardinality, the PK or

CK of the "One" side (PK or CK of Department table) should be
posted to the many side (Employee table). This will require
adding the PK of Department (DID) in the Employee Table as a
foreign key. We can give the foreign key another name which is
EDID to mean "Employee's Department id". this will affect the
degree of the Employee table.
Employee
EID FName LName Salary EDID

 Mapping the PARTICIPATES Relationship:

As the relationship is having many-to-many cardinality, we need
to create a new table and post the PK or CK of the Employee and
Project table into the new table. We can give a descriptive new
name for the new table like Emp_Partc_Project to mean
"Employee participate in a project".
Emp_Partc_Project
EID PID

 Mapping the LEADS Relationship:

As the relationship is associative entity, we are supposed to
create a table for the associative entity where the PK of
Employee and Project tables will be posted in the new table as a
foreign key. The new table will have the attributes of the
associative entity as columns. We can give a descriptive new
name for the new table like Emp_Lead_Project to mean
"Employee leads a project".
Emp_Lead_Project
EID PID PBonus StartDa EndDat
te e

At the end of the mapping we will have the following relational schema
(tables) for the logical database design phase.

Department
DID DName DLoc MEID

Project
PID PName PFund
Telephone
EID Tel

Employee
EID FName LName Salary EDID

68
Database Systems Lecture Note

Emp_Partc_Project
EID PID
Emp_Lead_Project
EID PID PBonus StartDa EndDat
te e

After converting the ER diagram in to table forms, the next phase is

implementing the process of normalization, which is a collection of
rules each table should satisfy.

Normalization
A relational database is merely a collection of data, organized in a
particular manner. As the father of the relational database approach,
Codd created a series of rules (tests) called normal forms that help
define that organization

One of the best ways to determine what information should be stored

in a database is to clarify what questions will be asked of it and what
data would be included in the answers.

Database normalization is a series of steps followed to obtain a

database design that allows for consistent storage and efficient access
of data in a relational database. These steps reduce data redundancy
and the risk of data becoming inconsistent.

NORMALIZATION is the process of identifying the logical associations

between data items and designing a database that will represent such
associations but without suffering the update anomalies which are;

1. Insertion Anomalies
2. Deletion Anomalies

3. Modification Anomalies

Normalization may reduce system performance since data will be cross

referenced from many tables. Thus denormalization is sometimes used
to improve performance, at the cost of reduced consistency
guarantees.

Normalization normally is considered “good” if it is lossless

decomposition.

69
Database Systems Lecture Note

All the normalization rules will eventually remove the update

anomalies that may exist during data manipulation after the
implementation. The update anomalies are;

The type of problems that could occur in insufficiently normalized table

is called update anomalies which includes;
(1) Insertion anomalies
An "insertion anomaly" is a failure to place information about a new
database entry into all the places in the database where information
about that new entry needs to be stored. Additionally, we may have
difficulty to insert some data. In a properly normalized database,
information about a new entry needs to be inserted into only one
place in the database; in an inadequately normalized database,
information about a new entry may need to be inserted into more
than one place and, human fallibility being what it is, some of the
needed additional insertions may be missed.
(2) Deletion anomalies
A "deletion anomaly" is a failure to remove information about an
existing database entry when it is time to remove that entry.
Additionally, deletion of one data may result in lose of other
information. In a properly normalized database, information about
an old, to-be-gotten-rid-of entry needs to be deleted from only one
place in the database; in an inadequately normalized database,
information about that old entry may need to be deleted from more
than one place, and, human fallibility being what it is, some of the
needed additional deletions may be missed.
(3) Modification anomalies
A modification of a database involves changing some value of the
attribute of a table. In a properly normalized database table, what
ever information is modified by the user, the change will be effected
and used accordingly.

In order to avoid the update anomalies we in a given table, the

solution is to decompose it to smaller tables based on the rule of
normalization. However, the decomposition has two important
properties

a. The Lossless-join property insures that any instance of the

original relation can be identified from the instances of the
smaller relations.

b. The Dependency preservation property implies that

constraint on the original dependency can be maintained by
enforcing some constraints on the smaller relations. i.e. we

70
Database Systems Lecture Note

don’t have to perform Join operation to check whether a

constraint on the original relation is violated or not.

The purpose of normalization is to reduce the chances for

anomalies to occur in a database.

71
Database Systems Lecture Note

Example of problems related with Anomalies

EmpI FNam LNam SkillI Skill SkillType Scho SchoolA Skil

D e e D ol dd l
Lev
el
12 Abebe Mekuri 2 SQL Database AAU Sidist_Kil 5
a o
16 Lemm Alemu 5 C++ Programmin Unity Gerji 6
a g
28 Chane Kebede 2 SQL Database AAU Sidist_Kil 10
o
25 Abera Taye 6 VB6 Programmin Helic Piazza 8
g o
65 Almaz Belay 2 SQL Database Helic Piazza 9
o
24 Dereje Tamiru 8 Oracl Database Unity Gerji 5
e
51 Selam Belay 4 Prolo Programmin Jimm Jimma 8
g g a City
94 Alem Kebede 3 Cisco Networking AAU Sidist_Kil 7
o
18 Girma Dereje 1 IP Programmin Jimm Jimma 4
g a City
13 Yared Gizaw 7 Java Programmin AAU Sidist_Kil 6
g o

Deletion Anomalies:
If employee with ID 16 is deleted then ever information about
skill C++ and the type of skill is deleted from the database. Then
we will not have any information about C++ and its skill type.

Insertion Anomalies:
What if we have a new employee with a skill called Pascal? We
can not decide weather Pascal is allowed as a value for skill and
we have no clue about the type of skill that Pascal should be
categorized as.

Modification Anomalies:
What if the address for Helico is changed from Piazza to
Mexico? We need to look for every occurrence of Helico and
change the value of School_Add from Piazza to Mexico, which
is prone to error.

72
Database Systems Lecture Note

Database-management system can work only with the

information that we put explicitly into its tables for a given
database and into its rules for working with those tables, where
such rules are appropriate and possible.

73
Database Systems Lecture Note

Functional Dependency (FD)

Before moving to the definition and application of normalization, it is
important to have an understanding of "functional dependency."

Data Dependency
The logical associations between data items that point the database
designer in the direction of a good database design are refered to as
determinant or dependent relationships.

Two data items A and B are said to be in a determinant or dependent

relationship if certain values of data item B always appears with
certain values of data item A. if the data item A is the determinant
data item and B the dependent data item then the direction of the
association is from A to B and not vice versa.

The essence of this idea is that if the existence of something, call it A,

implies that B must exist and have a certain value, then we say that
"B is functionally dependent on A." We also often express this
idea by saying that "A functionally determines B," or that "B is a
function of A," or that "A functionally governs B." Often, the notions of
functionality and functional dependency are expressed briefly by the
statement, "If A, then B." It is important to note that the value of B
must be unique for a given value of A, i.e., any given value of A must
imply just one and only one value of B, in order for the relationship to
qualify for the name "function." (However, this does not necessarily
prevent different values of A from implying the same value of B.)

However, for the purpose of normalization, we are interested in finding

1..1 (one to one) dependencies, lasting for all times (intension rather
than extension of the database), and the determinant having the
minimal number of attributes.

X  Y holds if whenever two tuples have the same value for X,

they must have the same value for Y

The notation is: AB which is read as; B is functionally dependent

on A

In general, a functional dependency is a relationship among

attributes. In relational databases, we can have a determinant that
governs one or several other attributes.

FDs are derived from the real-world constraints on the attributes and
they are properties on the database intension not extension.

74
Database Systems Lecture Note

Example
Dinner Type of
Course Wine
Meat Red
Fish White
Cheese Rose

Since the type of Wine served depends on the type of Dinner, we say
Wine is functionally dependent on Dinner.
Dinner  Wine

Dinner Type of Type of

Course Wine Fork
Meat Red Meat fork
Fish White Fish fork
Cheese Rose Cheese fork

Since both Wine type and Fork type are determined by the Dinner
type, we say Wine is functionally dependent on Dinner and Fork is
functionally dependent on Dinner.
Dinner  Wine
Dinner  Fork

Partial Dependency
If an attribute which is not a member of the primary key is dependent
on some part of the primary key (if we have composite primary key)
then that attribute is partially functionally dependent on the primary
key.

Let {A,B} is the Primary Key and C is no key attribute.

Then if {A,B}C and BC

Then C is partially functionally dependent on {A,B}

Full Functional Dependency

If an attribute which is not a member of the primary key is not
dependent on some part of the primary key but the whole key (if we
have composite primary key) then that attribute is fully functionally
dependent on the primary key.

75
Database Systems Lecture Note

Let {A,B} be the Primary Key and C is a non- key attribute

Then if {A,B}C and BC and AC does not

hold
Then C Fully functionally dependent on {A,B}

Transitive Dependency
In mathematics and logic, a transitive relationship is a relationship of
the following form: "If A implies B, and if also B implies C, then A
implies C."

Example:
If Mr X is a Human, and if every Human is an Animal, then Mr X must
be an Animal.

Generalized way of describing transitive dependency is that:

If A functionally governs B, AND

If B functionally governs C
THEN A functionally governs C
Provided that neither C nor B determines A i.e. (B / A and C / A)
In the normal notation:

{(AB) AND (BC)} ==> AC provided that B / A

and C / A

76
Database Systems Lecture Note

Steps of Normalization:
We have various levels or steps in normalization called Normal Forms.
The level of complexity, strength of the rule and decomposition
increases as we move from one lower level Normal Form to the higher.

A table in a relational database is said to be in a certain normal form if

it satisfies certain constraints.

A normal form below represents a stronger condition than the previous

one

Normalization towards a logical design consists of the

following steps:

UnNormalized Form(UNF):
Identify all data elements
First Normal Form(1NF):
Find the key with which you can find all data i.e. remove any repeating group
Second Normal Form(2NF):
Remove part-key dependencies (partial dependency). Make all data dependent
on the whole key.
Third Normal Form(3NF)
Remove non-key dependencies (transitive dependencies). Make all data
dependent on nothing but the key.
For most practical purposes, databases are considered normalized if
they adhere to the third normal form (there is no transitive
dependency).

First Normal Form (1NF)

Requires that all column values in a table are atomic (e.g., a
number is an atomic value, while a list or a set is not).
We have tow ways of achiving this:
1. Putting each repeating group into a separate table and
connecting them with a primary key-foreign key
relationship
2. Moving these repeating groups to a new row by repeating
the non-repeating attributes known as “flattening” the
table. If so then Find the key with which you can find all
data

Definition: a table (relation) is in 1NF

If
 There are no duplicated rows in the table. Unique
identifier

77
Database Systems Lecture Note

 Each cell is single-valued (i.e., there are no

repeating groups).
 Entries in a column (attribute, field) are of the same
kind.

78
Database Systems Lecture Note

Example for First Normal form

(1NF )
Unnormalized
EmpI FirstNa LastNa Skill SkillType Scho SchoolA SkillLev
D me me ol dd el
12 Abebe Mekuria SQL, Database, AAU, Sidist_Kil 5
VB6 Programmin Helic o 8
g o Piazza
16 Lemma Alemu C++ Programmin Unity Gerji 6
IP g Jimm Jimma 4
Programmin a City
g
28 Chane Kebede SQL Database AAU Sidist_Kil 10
o
65 Almaz Belay SQL Database Helic Piazza 9
Prolo Programmin o Jimma 8
g g Jimm City 6
Java Programmin a Sidist_Kil
g AAU o
24 Dereje Tamiru Oracl Database Unity Gerji 5
e
94 Alem Kebede Cisco Networking AAU Sidist_Kil 7
o

FIRST NORMAL FORM (1NF)

Remove all repeating groups. Distribute the multi-valued attributes

into different rows and identify a unique identifier for the relation so
that is can be said is a relation in relational database. Flatten the
table.

EmpI FirstNa LastNa SkillI Skill SkillType Scho SchoolA SkillLev

D me me D ol dd el
12 Abebe Mekuria 1 SQL Database AAU Sidist_Kil 5
o
12 Abebe Mekuria 3 VB6 Programmin Helic Piazza 8
g o
16 Lemma Alemu 2 C++ Programmin Unity Gerji 6
g
16 Lemma Alemu 7 IP Programmin Jimm Jimma 4

79
Database Systems Lecture Note

g a City
28 Chane Kebede 1 SQL Database AAU Sidist_Kil 10
o
65 Almaz Belay 1 SQL Database Helic Piazza 9
o
65 Almaz Belay 5 Prolo Programmin Jimm Jimma 8
g g a City
65 Almaz Belay 8 Java Programmin AAU Sidist_Kil 6
g o
24 Dereje Tamiru 4 Oracl Database Unity Gerji 5
e
94 Alem Kebede 6 Cisco Networking AAU Sidist_Kil 7
o

80
Database Systems Lecture Note

Second Normal form 2NF

No partial dependency of a non key attribute on part of the primary
key. This will result in a set of relations with a level of Second Normal
Form.
Any table that is in 1NF and has a single-attribute (i.e., a non-
composite) key is automatically also in 2NF.

Definition: a table (relation) is in 2NF

If
 It is in 1NF and
 If all non-key attributes are dependent on the entire
primary key. i.e. no partial dependency.

Example for 2NF:

Emp_Proj
EmpI EmpNa ProjN ProjNa ProjL ProjFu ProjMang Incenti
D me o me oc nd ID ve

EMP_PROJ rearranged
EmpI ProjN EmpNa ProjNa ProjL ProjFu ProjMang Incenti
D o me me oc nd ID ve

Business rule: Whenever an employee participates in a project, he/she

will be entitled for an incentive.

This schema is in its 1NF since we don’t have any repeating groups or
attributes with multi-valued property. To convert it to a 2NF we need to
remove all partial dependencies of non key attributes on part of the
primary key.

{EmpID, ProjNo} EmpName, ProjName, ProjLoc, ProjFund,

ProjMangID, Incentive

But in addition to this we have the following dependencies

FD1: {EmpID}EmpName
FD2: {ProjNo}ProjName, ProjLoc, ProjFund, ProjMangID
FD3: {EmpID, ProjNo} Incentive

As we can see, some non key attributes are partially dependent on

some part of the primary key. This can be witnessed by analyzing the
first two functional dependencies (FD1 and FD2). Thus, each Functional

81
Database Systems Lecture Note

Dependencies, with their dependent attributes should be moved to a

new relation where the Determinant will be the Primary Key for each.

Employee
EmpI EmpNa
D me
Project
ProjN ProjNa ProjL ProjFu ProjMang
o me oc nd ID
Emp_Proj
EmpI ProjN Incenti
D o ve

Third Normal Form (3NF)

Eliminate Columns dependent on another non-Primary Key - If
attributes do not contribute to a description of the key; remove them
to a separate table.
This level avoids update and deletes anomalies.

Definition: a Table (Relation) is in 3NF

If
 It is in 2NF and
 There are no transitive dependencies between a
primary key and non-primary key attributes.

Example for (3NF)

Assumption: Students of same batch (same year) live in one
building or dormitory
Student
StudID Stud_F_Na Stud_L_Na Dept Yea Dormitary
me me r
125/97 Abebe Mekuria Info Sc 1 401
654/95 Lemma Alemu Geog 3 403
842/95 Chane Kebede CompS 3 403
c
165/97 Alem Kebede InfoSc 1 401
985/95 Almaz Belay Geog 3 403

This schema is in its 2NF since the primary key is a single

attribute and there are no repeating groups (multi valued
attributes).

82
Database Systems Lecture Note

Let’s take StudID, Year and Dormitary and see the

dependencies.

StudIDYear AND YearDormitary

And Year can not determine StudID and Dormitary can
not determine StudID Then transitively
StudIDDormitary

To convert it to a 3NF we need to remove all transitive

dependencies of non key attributes on another non-key
attribute.

The non-primary key attributes, dependent on each other will be

moved to another table and linked with the main table using Candidate
Key- Foreign Key relationship.

Student Dorm
Yea Dormita
StudI Stud Stud Deptr Yea
ry
D F_Nam L_Nam 1 r 401
e e
125/9 Abebe Mekuria Info 3
Sc 1 403
7
654/9 Lemma Alemu Geog 3
5
842/9 Chane Kebede CompS 3
5 c
165/9 Alem Kebede InfoSc 1
7
985/9 Almaz Belay Geog 3 Generally, eventhough there
5 are other four additional levels
of Normalization, a table is said
to be normalized if it reaches 3NF. A database with all tables in the 3NF
is said to be Normalized Database.

Mnemonic for remembering the rationale for normalization up to 3NF

could be the following:

1. No Repeating or Redunduncy: no repeting fields in the

table.
2. The Fields Depend Upon the Key: the table should solely
depend on the key.

83
Database Systems Lecture Note

3. The Whole Key: no partial keybdependency.

4. And Nothing But the Key: no inter data dependency.

5. So Help Me Codd: since Codd came up with these rules.

84
Other Levels of Normalization
Boyce-Codd Normal Form
(BCNF):
BCNF is based on functional dependency that takes in to account all the
candidate keys in a relation.
So, table is in BCNF if it is in 3NF and if every determinant is a
candidate key. Violation of the BCNF is very rare. The potential sources
for violation of this rule are
1. The relation contains two (or more) composite candidate keys
2. The candidate keys over lap i.e. have common attribute.

The issue is related to:

Isolating Independent Multiple Relationships - No table may contain two or
more 1:N or N:M relationships that are not directly related.

The correct solution, to cause the model to be in 4th normal form, is to

ensure that all M:M relationships are resolved independently if they are
indeed independent, as shown below.

Forth Normal form (4NF)

Isolate Semantically Related Multiple Relationships - There may be
practical constrains on information that justify separating logically related
many-to-many relationships.
MVD(Multi-Valued Dependency ) : represents a dependency between
attributes( for example A, B,C) in a relation such that for every value of A
there is a set of values for B and there is a set of values for C but the sets
B and C are independent to each other.

MVD between attributes A, B, and C in a relation is represented as follows

A------>>B
A------->>C

Def: A table is in 4NF if it is in BCNF and if it has no multi-valued

dependencies.

Compiled By: Wondwossen Mulugeta, Faculty of Informatics, AAU 85

Fifth Normal Form (5NF)
Sometimes called the Project –Join –Normal Form (PJNF)
5NF is based on the Join dependency.
Join Dependency: a property of decomposition that ensures that no
spurious are generated when rejoining to obtain the original relation

Def: A table is in 5NF, also called "Projection-Join Normal Form"

(PJNF), if it is in 4NF and if every join dependency in the table
is a consequence of the candidate keys of the table.

Domain-Key Normal Form

(DKNF)
A model free from all modification anomalies.

Def: A table is in DKNF if every constraint on the table is a logical

consequence of the definition of keys and domains.

The underlying ideas in normalization are simple enough. Through

normalization we want to design for our relational database a set of tables
that;
(1) Contain all the data necessary for the purposes that the
database is to serve,
(2) Have as little redundancy as possible,
(3) Accommodate multiple values for types of data that require
them,
(4) Permit efficient updates of the data in the database, and
(5) Avoid the danger of losing data unknowingly.

Compiled By: Wondwossen Mulugeta, Faculty of Informatics, AAU 86

Pitfalls of Normalization

Problems associated with normalization

 Requires data to see the problems

 May reduce performance of the system
 Is time consuming,
 Difficult to design and apply and
 Prone to human error

Compiled By: Wondwossen Mulugeta, Faculty of Informatics, AAU 87

Chapter Five

Physical Database Design

Methodology for Relational
Database
We have established that there are three levels of database
design:
 Conceptual design: producing a data model which accounts for
the relevant entities and relationships within the target
application domain;
 Logical design: ensuring, via normalization procedures and the
definition of integrity rules, that the stored database will be non-
redundant and properly connected;
 Physical design: specifying how database records are stored,
accessed and related to ensure adequate performance.

It is considered desirable to keep these three levels quite

separate -- one of Codd's requirements for an RDBMS is that it
should maintain logical-physical data independence. The
generality of the relational model means that RDBMSs are
potentially less efficient than those based on one of the older
data models where access paths were specified once and for all
at the design stage. However the relational data model does not
preclude the use of traditional techniques for accessing data - it
is still essential to exploit them to achieve adequate performance
with a database of any size.

We can consider the topic of physical database design from three

aspects:
 What techniques for storing and finding data exist
 Which are implemented within a particular DBMS
 Which might be selected by the designer for a given application
knowing the properties of the data

Thus the purpose of physical database design is:

1. How to map the logical database design to a physical database

design.
2. How to design base relations for target DBMS.
3. How to design enterprise constraints for target DBMS.

Compiled By: Wondwossen Mulugeta, Faculty of Informatics, AAU 88

4. How to select appropriate file organizations based on analysis of
transactions.
5. When to use secondary indexes to improve performance.
6. How to estimate the size of the database
7. How to design user views
8. How to design security mechanisms to satisfy user requirements.
9. How to design procedures and triggers.

Physical database design is the process of producing a

description of the implementation of the database on secondary
storage.
Physical design describes the base relation, file organization, and
indexes used to achieve efficient access to the data, and any
associated integrity constraints and security measures.

 Sources of information for the physical design process include

global logical data model and documentation that describes
model. Set of normalized relation.
 Logical database design is concerned with the what; physical
database design is concerned with the how.
 The process of producing a description of the implementation
of the database on secondary storage.
 Describes the storage structures and access methods used to
achieve efficient access to the data.

Steps in physical database design

1. Translate logical data model for target DBMS
1.1. Design base relation
1.2. Design representation of derived data
1.3. Design enterprise constraint
2. Design physical representation
2.1. Analyze transactions
2.2. Choose file organization
2.3. Choose indexes
2.4. Estimate disk space and system
requirement
3. Design user view
4. Design security mechanisms
5. Consider controlled redundancy
6. Monitor and tune the operational system

Compiled By: Wondwossen Mulugeta, Faculty of Informatics, AAU 89

1. Translate logical data model for target
DBMS
This phase is the translation of the global logical data model to
produce a relational database schema in the target DBMS. This
includes creating the data dictionary based on the logical model
and information gathered.
After the creation of the data dictionary, the next activity is to
understand the functionality of the target DBMS so that all
necessary requirements are fulfilled for the database intended to
be developed.

Knowledge of the DBMS includes:

 how to create base relations
 whether the system supports:
o definition of Primary key
o definition of Foreign key
o definition of Alternate key(Unique keys)
o definition of Domains
o Referential integrity constraints
o definition of enterprise level constraints

1.1. Design base relation

To decide how to represent base relations identified in global
logical model in target DBMS.
Designing base relation involves identification of all necessary
requirements about a relation starting from the name up to the
referential integrity constraints.
For each relation, need to define:
 The name of the relation;
 A list of simple attributes in brackets;
 The PK and, where appropriate, AKs and FKs.
 A list of any derived attributes and how they should be
computed;
 Referential integrity constraints for any FKs identified.
For each attribute, need to define:
 Its domain, consisting of a data type, length, and any
constraints on the domain;
 An optional default value for the attribute;

Compiled By: Wondwossen Mulugeta, Faculty of Informatics, AAU 90

 Whether the attribute can hold nulls.
 Whether the attribute can be derived , if do how it should be
computed

The implementation of the physical model is dependent on the

target DBMS since some has more facilities than the other in
defining database definitions.
The base relation design along with every justifiable reason
should be fully documented.

1.2. Design representation of derived

data
While analyzing the requirement of users, we may encounter that
there are some attributes holding data that will be derived from
existing or other attributes. A decision on how to represent any
derived data present in the global logical data model in the target
DBMS should be devised.

Examine logical data model and data dictionary, and produce list
of all derived attributes. Most of the time derived attributes are
not expressed in the logical model but will be included in the data
dictionary. Whether to store derived attributes in a base relation
or calculate them when required is a decision to be made by the
designer considering the performance impact.
Option selected is based on:
 Additional cost to store the derived data and keep it
consistent with operational data from which it is derived;
 Cost to calculate it each time it is required.
Less expensive option is chosen subject to performance
constraints.
The representation of derived attributes should be fully
documented.

1.3. Design enterprise constraint

Data in the database is not only subjected to constraints on the
database and the data model used but also with some enterprise
dependent constraints. These constraint definitions are also
dependent on the DBMS selected and enterprise level
requirements.

Compiled By: Wondwossen Mulugeta, Faculty of Informatics, AAU 91

One need to know the functionalities of the DBMS since in
designing the enterprise constraints for the target DBMS some
DBMS provide more facilities than others.

All the enterprise level constraints and the definition method in

the target DBMS should be fully documented.

2. Design physical representation

This phase is the level for determining the optimal file
organizations to store the base relations and the indexes that are
required to achieve acceptable performance; that is, the way in
which relations and tuples will be held on secondary storage.
Number of factors that may be used to measure efficiency:
 Transaction throughput: number of transactions processed
in given time interval.
 Response time: elapsed time for completion of a single
transaction.
 Disk storage: amount of disk space required to store
database files.
However, no one factor is always correct.
Typically, have to trade one factor off against another to achieve
a reasonable balance.

2.1. Analyze transactions

The objective here is to understand the functionality of the
transactions that will run on the database and to analyze the
important transactions.
Attempt to identify performance criteria, e.g.:
 Transactions that run frequently and will have a significant
impact on performance;
 Transactions that are critical to the business;
 Times during the day/week when there will be a high
demand made on the database (called the peak load).
Use this information to identify the parts of the database that
may cause performance problems.
To select appropriate file organizations and indexes, also need to
know high-level functionality of the transactions, such as:
 Attributes that are updated in an update transaction;
 Criteria used to restrict tuples that are retrieved in a query.

Compiled By: Wondwossen Mulugeta, Faculty of Informatics, AAU 92

Often not possible to analyze all expected transactions, so
investigate most ‘important’ ones.
To help identify which transactions to investigate, can use:
 Transaction/relation cross-reference matrix, showing
relations that each transaction accesses, and/or
 Transaction usage map, indicating which relations are
potentially heavily used.
To focus on areas that may be problematic:
1. Map all transaction paths to relations.
2. Determine which relations are most frequently
accessed by transactions.
3. Analyze the data usage of selected transactions that
involve these relations.

2.2. Choose file organization

The objective here is to determine an efficient file organization
for each base relation
File organizations include Heap, Hash, Indexed Sequential office
Access Method (ISAM), B+-Tree, and Clusters.

Most DBMSs provide little or no option to select file organization.

However, they prove the user with an option to select an index
for every relation

2.3. Choose indexes

The objective here is to determine whether adding indexes will
improve the performance of the system.
One approach is to keep tuples unordered and create as many
secondary indexes as necessary.
Another approach is to order tuples in the relation by specifying a
primary or clustering index.
In this case, choose the attribute for ordering or clustering the
tuples as:
 Attribute that is used most often for join operations - this
makes join operation more efficient, or
 Attribute that is used most often to access the tuples in a
relation in order of that attribute.
If ordering attribute chosen is on the primary key of a relation,
index will be a primary index; otherwise, index will be a clustering
index.

Compiled By: Wondwossen Mulugeta, Faculty of Informatics, AAU 93

Each relation can only have either a primary index or a clustering
index.
Secondary indexes provide a mechanism for specifying an
additional key for a base relation that can be used to retrieve
data more efficiently.
Overhead involved in maintenance and use of secondary indexes
that has to be balanced against performance improvement
gained when retrieving data.
This includes:
 Adding an index record to every secondary index whenever
tuple is inserted;
 Updating a secondary index when corresponding tuple is
updated;
 Increase in disk space needed to store the secondary index;
 Possible performance degradation during query
optimization to consider all secondary indexes.
Guidelines for Choosing Indexes
(1) Do not index small relations.
(2) Index PK of a relation if it is not a key of the file
organization.
(3) Add secondary index to a FK if it is frequently accessed.
(4) Add secondary index to any attribute that is heavily used
as a secondary key.
(5) Add secondary index on attributes that are involved in:
selection or join criteria; ORDER BY; GROUP BY; and other
operations involving sorting (such as UNION or DISTINCT).
(6) Add secondary index on attributes involved in built-in
functions.
(7) Add secondary index on attributes that could result in an
index-only plan.
(8) Avoid indexing an attribute or relation that is frequently
updated.
(9) Avoid indexing an attribute if the query will retrieve a
significant proportion of the tuples in the relation.
(10) Avoid indexing attributes that consist of long character
strings.

2.4. Estimate disk space and system

requirement
The objective here is to estimate the amount of disk space that will
be required by the database.

Compiled By: Wondwossen Mulugeta, Faculty of Informatics, AAU 94

Purpose is to answer the following questions:
 If system already exists: is there adequate storage?
 If procuring new system: what storage will be required?

3. Design user view

To design the user views that was identified during the
Requirements
Collection and Analysis stage of the relational database application
development lifecycle.
Define views in DDL to provide user views identified in data model
Map onto objects in physical data model

4. Design security mechanisms

To design the security measures for the database as specified by
the users.
System security – Authentication
Data security-authorizations

5. Consider the Introduction of Controlled

Redundancy
The objective here is to determine whether introducing redundancy
in a controlled manner by relaxing the normalization rules will
improve the performance of the system. This is sometimes known
as denormalization
Informally speaking, denormalization is merging of relations
Result of normalization is a logical database design that is
structurally consistent and has minimal redundancy.
However, sometimes a normalized database design does not
provide maximum processing efficiency.
It may be necessary to accept the loss of some of the benefits of a
fully normalized design in favor of performance.
Also consider that denormalization:
 Makes implementation more complex;
 Often sacrifices flexibility;
 May speed up retrievals but it slows down updates.
Denormalization refers to a refinement to relational schema such
that the degree of normalization for a modified relation is less than
the degree of at least one of the original relations.

Compiled By: Wondwossen Mulugeta, Faculty of Informatics, AAU 95

Also use term more loosely to refer to situations where two relations
are combined into one new relation, which is still normalized but
contains more nulls than original relations. No fixed rule when to
denormalize but ,
Consider denormalization in following situations, specifically to
speed up frequent or critical transactions:
 Step 1 Combining 1:1 relationships
 Step 2 Duplicating non-key attributes in 1:* relationships to reduce
joins
 Step 3 Duplicating foreign key attributes in 1:* relationships to
reduce joins
 Step 4 Introducing repeating groups
 Step 5 Merging lookup tables with base relations
 Step 6 Creating extract tables.

6. Monitoring and Tuning the operational

system
The objective here is to monitor operational system and improve
performance of system to correct inappropriate design decisions or
reflect changing requirements.

Importance of monitoring and tuning the operational system

 Avoids procurement of additional hardware
 Down size the hardware configuration less and
cheaper hardware less expensive maintenance.
 Faster response time and high throughput more
productive
 Faster response time good staff moral, customer
satisfaction

Compiled By: Wondwossen Mulugeta, Faculty of Informatics, AAU 96

Chapter Six
Relational Query Languages
In addition to the structural component of any data model equally
important is the manipulation mechanism. This component of any data
model is called the “query language”.

 Query languages: Allow manipulation and retrieval of data

from a database.
 Query Languages! = programming languages!
 QLs not intended to be used for complex calculations.
 QLs support easy, efficient access to large data sets.
 Relational model supports simple, powerful query languages.

Formal Relational Query Languages

 There are varieties of Query languages used by relational DBMS
for manipulating relations.

 Some of them are procedural

 User tells the system exactly what and how to manipulate
the data
 Others are non-procedural
 User states what data is needed rather than how it is to
be retrieved.

Two mathematical Query Languages form the basis for Relational

Query Languages
 Relational Algebra:
 Relational Calculus:

 We may describe the relational algebra as procedural

language: it can be used to tell the DBMS how to build a new
relation from one or more relations in the database.
 We may describe relational calculus as a non procedural
language: it can be used to formulate the definition of a relation
in terms of one or more database relations.
 Formally the relational algebra and relational calculus are
equivalent to each other. For every expression in the
algebra, there is an equivalent expression in the calculus.
 Both are non-user friendly languages. They have been used as
the basis for other, higher-level data manipulation languages for
relational databases.

Compiled By: Wondwossen Mulugeta, Faculty of Informatics, AAU 97

A query is applied to relation instances, and the result of a
query is also a relation instance.
 Schemas of input relations for a query are fixed
 The schema for the result of a given query is also fixed!
Determined by definition of query language constructs.

Relational Algebra
The basic set of operations for the relational model is known as the
relational algebra. These operations enable a user to specify basic
retrieval requests.

The result of the retrieval is a new relation, which may have been
formed from one or more relations. The algebra operations thus
produce new relations, which can be further manipulated using
operations of the same algebra.

A sequence of relational algebra operations forms a relational

algebra expression, whose result will also be a relation that
represents the result of a database query (or retrieval request).

 Relational algebra is a theoretical language with operations that

work on one or more relations to define another relation without
changing the original relation.
 The output from one operation can become the input to another
operation (nesting is possible)

 There are different basic operations that could be

applied on relations on a database based on the
requirement.
 Selection ( s ) Selects a subset of rows from a
relation.
 Projection ( p ) Deletes unwanted columns from a
relation.
 Renaming: assigning intermediate relation for a single
operation
 Cross-Product ( x ) Allows to concatenate a tuple
from one relation with all the tuples from the other
relation.
 Set-Difference ( - ) Tuples in relation R1, but not in
relation R2.

Compiled By: Wondwossen Mulugeta, Faculty of Informatics, AAU 98

 Union ( ) Tuples in relation R1, or in relation R2.
 Intersection () Tuples in relation R1 and in relation R1
 Join Tuples joined from two relations based on a
condition
Join and intersection are derivable from the rest.
 Using these, we can build up sophisticated database queries.

Compiled By: Wondwossen Mulugeta, Faculty of Informatics, AAU 99

Table1:
Sample table used to illustrate different kinds of
relational operations. The relation contains
information about employees, IT skills they have and
the school where they attend each skill.

Employee
EmpI FNam LNam SkillI Skill SkillType Scho SchoolA SkillLev
D e e D ol dd el
12 Abebe Mekuri 2 SQL Database AAU Sidist_Kilo 5
a
16 Lemm Alemu 5 C++ Programmin Unity Gerji 6
a g
28 Chane Kebede 2 SQL Database AAU Sidist_Kilo 10
25 Abera Taye 6 VB6 Programmin Helic Piazza 8
g o
65 Almaz Belay 2 SQL Database Helic Piazza 9
o
24 Dereje Tamiru 8 Oracl Database Unity Gerji 5
e
51 Selam Belay 4 Prolo Programmin Jimm Jimma 8
g g a City
94 Alem Kebede 3 Cisco Networking AAU Sidist_Kilo 7
18 Girma Dereje 1 IP Programmin Jimm Jimma 4
g a City
13 Yared Gizaw 7 Java Programmin AAU Sidist_Kilo 6
g

Compiled By: Wondwossen Mulugeta, Faculty of Informatics, AAU 100

1. Selection
 Selects subset of tuples/rows in a relation that satisfy selection
condition.
 Selection operation is a unary operator (it is applied to a single
relation)
 The Selection operation is applied to each tuple individually
 The degree of the resulting relation is the same as the original
relation but the cardinality (no. of tuples) is less than or equal to
the original relation.
 The Selection operator is commutative.
 Set of conditions can be combined using Boolean operations
((AND), (OR), and ~(NOT))
 No duplicates in result!
 Schema of result identical to schema of (only) input relation.
 Result relation can be the input for another relational algebra
operation! (Operator composition.)
 It is a filter that keeps only those tuples that satisfy a qualifying
condition (those satisfying the condition are selected while
others are discarded.)

Notation:
 <Selection Condition> <Relation Name>
Example: Find all Employees with skill type of Database.

 < SkillType =”Database”> (Employee)

This query will extract every tuple from a relation called Employee with
all the attributes where the SkillType attribute with a value of
“Database”.

The resulting relation will be the following.

EmpI FNa LNam SkillI Skill SkillTy Scho SchoolA SkillLev

D me e D pe ol dd el
12 Abeb Mekuri 2 SQL Databas AAU Sidist_Kil 5
e a e o
28 Chan Kebede 2 SQL Databas AAU Sidist_Kil 10
e e o
65 Almaz Belay 2 SQL Databas Helic Piazza 9
e o
24 Derej Tamiru 8 Oracl Databas Unity Gerji 5
e e e

Compiled By: Wondwossen Mulugeta, Faculty of Informatics, AAU 101

If the query is all employees with a SkillType Database and School
Unity the relational algebra operation and the resulting relation will be
as follows.

 < SkillType =”Database” AND School=”Unity”> (Employee)

EmpI FNa LNam SkillI Skill SkillTyp Scho SchoolA SkillLev
D me e D e ol dd el
24 Derej Tamir 8 Oracl Databas Unity Gerji 5
e u e e

Compiled By: Wondwossen Mulugeta, Faculty of Informatics, AAU 102

2. Projection
 Selects certain attributes while discarding the other from the
base relation.
 The PROJECT creates a vertical partitioning – one with the
needed columns (attributes) containing results of the operation
and other containing the discarded Columns.
 Deletes attributes that are not in projection list.
 Schema of result contains exactly the fields in the projection list,
with the same names that they had in the (only) input relation.
 Projection operator has to eliminate duplicates!
 Note: real systems typically don’t do duplicate elimination
unless the user explicitly asks for it.
 If the Primary Key is in the projection list, then duplication will
not occur
 Duplication removal is necessary to insure that the resulting
table is also a relation.

Notation:
 <Selected Attributes> <Relation Name>
Example: To display Name, Skill, and Skill Level of an employee, the
query and the resulting relation will be:

 <FName, LName, Skill, Skill_Level> (Employee)

FNam LNam Skill SkillLev
e e el
Abebe Mekuri SQL 5
a
Lemm Alemu C++ 6
a
Chane Kebede SQL 10
Abera Taye VB6 8
Almaz Belay SQL 9
Dereje Tamiru Oracl 5
e
Selam Belay Prolo 8
g
Alem Kebede Cisco 7
Girma Dereje IP 4
Yared Gizaw Java 6
If we want to have the Name, Skill, and Skill Level of an employee with
Skill SQL and SkillLevel greater than 5 the query will be:
<FName, LName, Skill, Skill_Level> (<Skill=”SQL”  SkillLevel>5>(Employee))

FNa LNam Skil SkillLev

Compiled By: Wondwossen Mulugeta, Faculty of Informatics, AAU 103
me e l el
Chan Kebed SQL 10
e e
Almaz Belay SQL 9

Compiled By: Wondwossen Mulugeta, Faculty of Informatics, AAU 104

3. Rename Operation
 We may want to apply several relational algebra operations one
after the other. The query could be written in two different forms:
1. Write the operations as a single relational algebra
expression by nesting the operations.
2. Apply one operation at a time and create intermediate
result relations. In the latter case, we must give names
to the relations that hold the intermediate
resultsRename Operation

If we want to have the Name, Skill, and Skill Level of an employee with
salary greater than 1500 and working for department 5, we can write
the expression for this query using the two alternatives:

1. A single algebraic expression:

The above used query is using a single algebra operation, which is:

 <FName, LName, Skill, Skill_Level> (  <Skill=”SQL” 

SkillLevel>5> (Employee))

2. Using an intermediate relation by the Rename

Operation:

Step1: Result1   <DeptNo=5  Salary>1500> (Employee)

Step2: Result  <FName, LName, Skill, Skill_Level> (Result1)

Then Result will be equivalent with the relation we get using

the first alternative.

Compiled By: Wondwossen Mulugeta, Faculty of Informatics, AAU 105

4. Set Operations
The three main set operations are the Union, Intersection and Set
Difference. The properties of these set operations are similar with the
concept we have in mathematical set theory. The difference is that, in
database context, the elements of each set, which is a Relation in
Database, will be tuples. The set operations are Binary operations
which demand the two operand Relations to have type compatibility
feature.

Type Compatibility
Two relations R1 and R2 are said to be Type Compatible if:
1. The operand relations R1(A1, A2, ..., An) and R2(B1, B2, ..., Bn)
have the same number of attributes, and
2. The domains of corresponding attributes must be compatible;
that is, Dom(Ai)=Dom(Bi) for i=1, 2, ..., n.
To illustrate the three set operations, we will make use of the following
two tables:
Employee
EmpI FNam LNam SkillI Skill SkillType Scho SkillLev
D e e D ol el
12 Abebe Mekuria 2 SQL Database AAU 5
16 Lemm Alemu 5 C++ Programmin Unity 6
a g
28 Chane Kebede 2 SQL Database AAU 10
25 Abera Taye 6 VB6 Programmin Helico 8
g
65 Almaz Belay 2 SQL Database Helico 9
24 Dereje Tamiru 8 Oracl Database Unity 5
e
51 Selam Belay 4 Prolo Programmin Jimma 8
g g
94 Alem Kebede 3 Cisco Networking AAU 7
18 Girma Dereje 1 IP Programmin Jimma 4
g
13 Yared Gizaw 7 Java Programmin AAU 6
g

RelationOne: Employees who attend Database Course

EmpI FNam LNam SkillI Skill SkillType Scho SkillLev
D e e D ol el
12 Abebe Mekuria 2 SQL Database AAU 5
28 Chane Kebede 2 SQL Database AAU 10
65 Almaz Belay 2 SQL Database Helico 9
24 Dereje Tamiru 8 Oracl Database Unity 5
e

Compiled By: Wondwossen Mulugeta, Faculty of Informatics, AAU 106

RelationTwo : Employees who attend a course in AAU
EmpI FNam LNam SkillI Skill SkillType Scho SkillLev
D e e D ol el
12 Abebe Mekuria 2 SQL Database AAU 5
94 Alem Kebede 3 Cisco Networking AAU 7
28 Chane Kebede 2 SQL Database AAU 10
13 Yared Gizaw 7 Java Programmin AAU 6
g

a. UNION Operation
The result of this operation, denoted by R U S, is a relation
that includes all tuples that are either in R or in S or in both
R and S. Duplicate tuple is eliminated.
The two operands must be "type compatible"

Eg: RelationOne U RelationTwo

Employees who attend Database in any School or who attend any
course at AAU

EmpI FNam LNam SkillI Skill SkillType Scho SkillLev

D e e D ol el
12 Abebe Mekuri 2 SQL Database AAU 5
a
28 Chane Kebede 2 SQL Database AAU 10
65 Almaz Belay 2 SQL Database Helic 9
o
24 Dereje Tamiru 8 Oracl Database Unity 5
e
94 Alem Kebede 3 Cisco Networking AAU 7
13 Yared Gizaw 7 Java Programmin AAU 6
g

b. INTERSECTION Operation
The result of this operation, denoted by R ∩ S, is a relation
that includes all tuples that are in both R and S. The two
operands must be "type compatible"
Eg: RelationOne ∩ RelationTwo
Employees who attend Database Course at AAU

EmpI FNam LNam SkillI Skill SkillType Scho SkillLev

D e e D ol el

Compiled By: Wondwossen Mulugeta, Faculty of Informatics, AAU 107

12 Abebe Mekuri 2 SQL Database AAU 5
a
28 Chane Kebede 2 SQL Database AAU 10

c. Set Difference (or MINUS)

Operation
The result of this operation, denoted by R - S, is a relation
that includes all tuples that are in R but not in S.
The two operands must be "type compatible"
Eg: RelationOne - RelationTwo
Employees who attend Database Course but didn’t take any course at
AAU
EmpI FNam LNam SkillI Skill SkillType Scho SkillLev
D e e D ol el
65 Almaz Belay 2 SQL Database Helic 9
o
24 Dereje Tamiru 8 Oracl Database Unity 5
e
Eg: RelationTwo - RelationOne
Employees who attend Database Course but didn’t take any course at
AAU

EmpI FNam LNam SkillI Skill SkillType Scho SkillLev

D e e D ol el
12 Abebe Mekuri 2 SQL Database AAU 5
a
94 Alem Kebede 3 Cisco Networking AAU 7
28 Chane Kebede 2 SQL Database AAU 10
13 Yared Gizaw 7 Java Programmin AAU 6
g

The resulting relation for; R1  R2, R1  R2, or R1-R2 has the

same attribute names as the first operand relation R1 (by
convention).

Some Properties of the Set Operators

Notice that both union and intersection are commutative
operations; that is
R  S = S  R, and R  S = S  R

Compiled By: Wondwossen Mulugeta, Faculty of Informatics, AAU 108

Both union and intersection can be treated as n-nary operations
applicable to any number of relations as both are associative
operations; that is
R  (S  T) = (R  S)  T, and (R  S)  T = R  (S
 T)

The minus operation is not commutative; that is, in general

R-S≠S–R

5. CARTESIAN (cross product)

Operation
This operation is used to combine tuples from two relations in a
combinatorial fashion. That means, every tuple in Relation (R) will be
related with every other tuple in Relation (S).
 In general, the result of R(A1, A2, . . ., An) x
S(B1,B2, . . .,
Bm) is a relation Q with degree n + m attributes Q(A1,
A2, . . ., An, B1, B2, . . ., Bm), in that order.
 Where R has n attributes and S has m attributes.
 The resulting relation Q has one tuple for each combination
of tuples—one from R and one from S.
 Hence, if R has n tuples, and S has m tuples, then | R x S |
will have n* m tuples.

Example:
Employee
ID FName LName
123 Abebe Lemma
567 Belay Taye
822 Kefle Kebede
Dept
DeptID DeptNam MangI
e D
2 Finance 567
3 Personnel 123
Then the Cartesian product between Employee and Dept relations will
be of the form:

Compiled By: Wondwossen Mulugeta, Faculty of Informatics, AAU 109

Employee X Dept:
ID FName LName DeptI DeptNam MangI
D e D
123 Abebe Lemma 2 Finance 567
123 Abebe Lemma 3 Personnel 123
567 Belay Taye 2 Finance 567
567 Belay Taye 3 Personnel 123
822 Kefle Kebede 2 Finance 567
822 Kefle Kebede 3 Personnel 123

Basically, even though it is very important in query processing, the

Cartesian Product is not useful by itself since it relates every tuple in the
First Relation with every other tuple in the Second Relation. Thus, to
make use of the Cartesian Product, one has to use it with the Selection
Operation, which discriminate tuples of a relation by testing whether
each will satisfy the selection condition.
In our example, to extract employee information about managers of the
departments (Managers of each department), the algebra query and the
resulting relation will be.
<ID, FName, LName, DeptName > (<ID=MangID>(Employee X Dept))

ID FName LName DeptNam

e
123 Abebe Lemma Personnel
567 Belay Taye Finance

6. JOIN Operation
The sequence of Cartesian product followed by select is used quite
commonly to identify and select related tuples from two relations, a
special operation, called JOIN. Thus in JOIN operation, the Cartesian
Operation and the Selection Operations are used together.
JOIN Operation is denoted by a symbol.
This operation is very important for any relational database with more
than a single relation, because it allows us to process relationships
among relations.
The general form of a join operation on two relations
R(A1, A2,. . ., An) and S(B1, B2, . . ., Bm) is:

R <join condition> S is equivalent to  <selection

condition> (R X S)

Compiled By: Wondwossen Mulugeta, Faculty of Informatics, AAU 110

where <join condition> and <selection condition> are the
same

Where, R and S can be any relation that results from general relational
algebra expressions.
Since JOIN is an operation that needs two relation, it is a Binary
operation.

This type of JOIN is called a THETA JOIN ( - JOIN)

Where  is the logical operator used in the join condition.
 Could be { <,  , >, , , = }

Example:
Thus in the above example we want to extract employee
information about managers of the departments, the algebra query
using the JOIN operation will be.

Employee < ID=MangID> Dept

a. EQUIJOIN Operation
The most common use of join involves join conditions with equality
comparisons only (=). Such a join, where the only comparison operator
used is the equal sign is called an EQUIJOIN. In the result of an
EQUIJOIN we always have one or more pairs of attributes (whose
names need not be identical) that have identical values in every tuple
since we used the equality logical operator.
For example, the above JOIN expression is an EQUIJOIN
since the logical operator used is the equal to operator
(=).
b. NATURAL JOIN Operation
We have seen that in EQUIJOIN one of each pair of attributes with
identical values is extra, a new operation called natural join was
created to get rid of the second (or extra) attribute that we will have in
the result of an EQUIJOIN condition.
The standard definition of natural join requires that the two join
attributes, or each pair of corresponding join attributes, have the same
name in both relations. If this is not the case, a renaming operation on
the attributes is applied first.

Compiled By: Wondwossen Mulugeta, Faculty of Informatics, AAU 111

R1R S represents a natural join between R and S. The
degree of R1 is degree of R plus Degree of S less the number of
common attributes
c. OUTER JOIN Operation
OUTER JOIN is another version of the JOIN operation where non
matching tuples from a relation are also included in the result with
NULL values for attributes in the other relation.
There are two major types of OUTER JOIN.
1. RIGHT OUTER JOIN: where non matching tuples from the
second (Right) relation are included in the result with NULL value
for attributes of the first (Left) relation.
2. LEFT OUTER JOIN: where non matching tuples from the first
(Left) relation are included in the result with NULL value for
attributes of the second (Right) relation.

Notation for Left Outer Join:

R <Join Condition > S theta left outer Join

R S  natural left outer join

When two relations are joined by a JOIN operator, there could be some
tuples in the first relation not having a matching tuple from the second
relation, and the query is interested to display these non matching
tuples from the first or second relation. Such query is represented by
the OUTER JOIN.

d. SEMIJOIN Operation
SEMI JOIN is another version of the JOIN operation where the resulting
Relation will contain those attributes of only one of the Relations that
are related with tuples in the other Relation. The following notation
depicts the inclusion of only the attributes form the first relation (R) in
the result which are actually participating in the relationship.

R <Join Condition> S
Aggregate functions and Grouping statements
Some queries may involve aggregate function (scalar
aggregates like totals in a report, or Vector aggregates
like subtotals in reports)

Compiled By: Wondwossen Mulugeta, Faculty of Informatics, AAU 112

a) AL (R): Scalar aggregate functions on relation R
with AL as a list of (<aggregate function
> ,<attribute >) pairs

b) GA AL (R): Vector aggregate functions on relation R

with AL as list of (<aggregate function >, <attribute
>) pairs with a grouping attribute GA.

Example (a): the number of employees in a an

organization (assume you have an employee table)
This is a scalar aggregate

PR(Num_Employees) Count EmpId (Employee) ,

where PR = Produce relation R

Example (b): the number of employees in each

department of an organization (assume you have an
employee table)
This is a vector aggregate

PR (DeptId, Num_Employees) DeptId Count EmpId

(Employee) , where PR = Produce relation R

Relational Calculus
A relational calculus expression creates a new relation, which is
specified in terms of variables that range over rows of the stored
database relations (in tuple calculus) or over columns of the
stored relations (in domain calculus).

In a calculus expression, there is no order of operations to specify

how to retrieve the query result. A calculus expression specifies

Compiled By: Wondwossen Mulugeta, Faculty of Informatics, AAU 113

what information
only the result should contain rather than
how to retrieve it.
In Relational calculus, there is no description of how to evaluate a
query; this is the main distinguishing feature between relational
algebra and relational calculus.

Relational calculus is considered to be a nonprocedural language.

This differs from relational algebra, where we must write a
sequence of operations to specify a retrieval request; hence
relational algebra can be considered as a procedural way of
stating a query.

When applied to relational database, the calculus is not that of

derivative and differential but in a form of first-order logic or
predicate calculus, a predicate is a truth-valued function with
arguments.

When we substitute values for the arguments in the predicate,

the function yields an expression, called a proposition, which
can be either true or false.

If a predicate contains a variable, as in ‘x is a member of

staff’, there must be a range for x. When we substitute some
values of this range for x, the proposition may be true; for other
values, it may be false.

If COND is a predicate, then the set of all tuples evaluated to be

true for the predicate COND will be expressed as follows:
{t | COND(t)}
Where t is a tuple variable and COND (t) is a
conditional expression involving t. The result of such a
query is the set of all tuples t that satisfy COND (t).
If we have set of predicates to evaluate for a single query, the
predicates can be connected using (AND), (OR), and
~(NOT)

Compiled By: Wondwossen Mulugeta, Faculty of Informatics, AAU 114

A relational calculus expression creates a new
relation, which is specified in terms of variables that
range over rows of the stored database relations (in
tuple calculus) or over columns of the stored relations
(in domain calculus).

Tuple-oriented Relational Calculus

 The tuple relational calculus is based on specifying a
number of tuple variables. Each tuple variable usually
ranges over a particular database relation, meaning that
the variable may take as its value any individual tuple from
that relation.
 Tuple relational calculus is interested in finding tuples for
which a predicate is true for a relation. Based on use of
tuple variables.
 Tuple variable is a variable that ‘ranges over’ a named
relation: that is, a variable whose only permitted values are
tuples of the relation.
 If E is a tuple that ranges over a relation employee, then it
is represented as EMPLOYEE(E) i.e. Range of E is
EMPLOYEE

 Then to extract all tuples that satisfy a certain condition, we

will represent it as all tuples E such that COND(E) is
evaluated to be true.
{E  COND(E)}

The predicates can be connected using the Boolean operators:

 (AND),  (OR),  (NOT)

COND(t) is a formula, and is called a Well-Formed-Formula (WFF)

if:
 Where the COND is composed of n-nary predicates
(formula composed of n single predicates) and the
predicates are connected by any of the Boolean
operators.

Compiled By: Wondwossen Mulugeta, Faculty of Informatics, AAU 115

 And each predicate is of the form A  B and  is one of
the logical operators { <,  , >, , , = }which
could be evaluated to either true or false. And A and B
are either constant or variables.
 Formulae should be unambiguous and should make
sense.

Example (Tuple Relational Calculus)

 Extract all employees whose skill level is greater than or
equal to 8
{E | Employee(E)  [Link] >= 8}

EmpI FNa LNam SkillI Skill SkillType Scho SchoolA SkillLev

D me e D ol dd el
28 Chan Kebed 2 SQL Database AAU Sidist_Kil 10
e e o
25 Abera Taye 6 VB6 Programmin Helic Piazza 8
g o
65 Almaz Belay 2 SQL Database Helic Piazza 9
o
51 Selam Belay 4 Prolo Programmin Jimm Jimma 8
g g a City

 To find only the EmpId, FName, LName, Skill and the School
where the skill is attended where of employees with skill
level greater than or equal to 8, the tuple based relational
calculus expression will be:

{[Link], [Link], [Link], [Link], [Link] | Employee(E) 

[Link] >= 8}

EmpI FNa LNam Skill Scho

D me e ol
28 Chan Kebed SQL AAU
e e
25 Abera Taye VB6 Helic
o
65 Almaz Belay SQL Helic
o
51 Selam Belay Prolo Jimm
g a

Compiled By: Wondwossen Mulugeta, Faculty of Informatics, AAU 116

 [Link] means the value of the First Name (FName)
attribute for the tuple E.

Compiled By: Wondwossen Mulugeta, Faculty of Informatics, AAU 117

Quantifiers in Relational Calculus
 To tell how many instances the predicate applies to, we can
use the two quantifiers in the predicate logic.
 One relational calculus expressed using Existential
Quantifier can also be expressed using Universal Quantifier.

1. Existential quantifier  (‘there

exists’)
Existential quantifier used in formulae that must be
true for at least one instance, such as:
An employee with skill level greater than or equal to 8
will be:
{E | Employee(E)  (E)([Link] >=
8)}

This means, there exist at least one tuple of the

relation employee where the value for the SkillLevel
is greater than or equal to 8

2. Universal quantifier  (‘for all’)

Universal quantifier is used in statements about every
instance, such as:
An employee with skill level greater than or equal to 8
will be:
{E | Employee(E)  (E)([Link] >=
8)}
This means, for all tuples of relation employee
where value for the SkillLevel attribute is greater
than or equal to 8.

Example:

Let’s say that we have the following Schema (set of Relations)

Employee(EID, FName, LName, EDID)

Project(PID, PName, PDID)
Dept(DID, DName, DMangID)
WorksOn(WEID, WPID)

Compiled By: Wondwossen Mulugeta, Faculty of Informatics, AAU 118

To find employees who work on projects controlled by
department 5 the query will be:
{E | Employee(E)  (P)(Project(P)  (w)(WorksOn(w)  PDID =5 
EID=WEID))}

Domain Relational Calculus

In tuple relational Calculus, we use variables that range over tuples of a relation, in
the case of domain relational calculus we use variables that range over domain
elements (field variables).
 An expression in the domain relational calculus has the following general form
{(x1,x2,x3,….xn)| P(x1,x2,x3,….xn,xm)}

Where (x1,x2,x3,….xn) represents the domain variables and P(x1,x2,x3,….xn,xm)

represents the formula
Formulas are of the form R(x1,x2,x3,….xn), x1 x2 or
xi C where  є {<,>,<=,>=,=,≠} and R is a relation of degree n and
each xi is domain variable
If f1 and f2 are formulas then so are
f1  f2 , f1  f2 ,~f1 , (x)f1 , (x)f1
 The Answer for such a query includes all tuples with attributes (x 1,x2,x3,….xn)
that make the formula P(x1,x2,x3,….xn,xm) be true.
 Formula is recursively defined, starting with simple atomic formulas (getting
tuples from relations or making comparisons of values), and building bigger
and better formulas using the logical connectives. i.e the Predicate P can be set
of formula combined by Boolean operators

Example: Consider the schema of relations on page 102.

Query1: list Employees
{Fname, Lname| (Employee (EID,FName, LName)}
Query2: Find the list of Employees who work in the department of IS
Domain relational Calculus expression for the query
{EID,Fname,Lname|(DName,EDID,DID)
(Employee(EID,FName,

Compiled By: Wondwossen Mulugeta, Faculty of Informatics, AAU 119

LName)Department(DID,DName,DMangID)DID=EDIDDName=’IS’)
}
, Where DName, EDID, DID DName, EDID, DID
Query3:List the names of employees that do not manage any
department
{Fname,Lname|(EID)(Employee(EID,Fname,Lname) (~(DMangId)
(Dept(DID,Dname,DMangId) (EID=DMangId))))}

Chapter Seven
Advanced Concepts in
Database Systems

 Database Security and Integrity

 Distributed Database Systems
 Data warehousing

1. Database Security and

Integrity
A database represents an essential corporate resource that
should be properly secured using appropriate controls.
 Database security encompasses hardware, software,
people and data

Multi-user database system - DBMS must provide a database

security and authorization subsystem to enforce limits on
individual and group access rights and privileges.

Database security and integrity is about protecting the database

from being inconsistent and being disrupted. We can also call it
database misuse.

Database misuse could be Intentional or accidental, where

accidental misuse is easier to cope with than intentional misuse.

Compiled By: Wondwossen Mulugeta, Faculty of Informatics, AAU 120

Accidental inconsistency could occur due to:
 System crash during transaction processing
 Anomalies due to concurrent access
 Anomalies due to redundancy
 Logical errors

Like wise, even though there are various threats that could be
categorized in this group, intentional misuse could be:
 Unauthorized reading of data
 Unauthorized modification of data or
 Unauthorized destruction of data

Most systems implement good Database Integrity to protect

the system from accidental misuse while there are many
computer based measures to protect the system from intentional
misuse, which is termed as Database Security measures.

 Database security is considered in relation to the following

situations:
 Theft and fraud
 Loss of confidentiality (secrecy)
 Loss of privacy
 Loss of integrity
 Loss of availability

Security Issues and general considerations

 Legal, ethical and social issues regarding the right to
access information
 Physical control
 Policy issues regarding privacy of individual level at
enterprise and national level
 Operational consideration on the techniques used
(password, etc)
 System level security including operating system and
hardware control

Compiled By: Wondwossen Mulugeta, Faculty of Informatics, AAU 121

 Security levels and security policies in enterprise level

 Database security - the mechanisms that protect the

database against intentional or accidental threats. And
Database security encompasses hardware, software, people
and data

 Threat – any situation or event, whether intentional or

accidental, that may adversely affect a system and
consequently the organization
 A threat may be caused by a situation or event involving a
person, action, or circumstance that is likely to bring harm to
an organization
 The harm to an organization may be tangible or
intangible
Tangible – loss of hardware, software, or data
Intangible – loss of credibility or client confidence

Examples of threats:
 Using another persons’ means of access
 Unauthorized amendment/modification or copying of
data
 Program alteration
 Inadequate policies and procedures that allow a mix of
confidential and normal out put
 Wire-tapping
 Illegal entry by hacker
 Blackmail
 Creating ‘trapdoor’ into system
 Theft of data, programs, and equipment
 Failure of security mechanisms, giving greater access
than normal
 Staff shortages or strikes
 Inadequate staff training
 Viewing and disclosing unauthorized data
 Electronic interference and radiation
 Data corruption owing to power loss or surge
 Fire (electrical fault, lightning strike, arson), flood, bomb
 Physical damage to equipment
 Breaking cables or disconnection of cables

Compiled By: Wondwossen Mulugeta, Faculty of Informatics, AAU 122

 Introduction of viruses

Levels of Security Measures

Security measures can be implemented at several levels and for
different components of the system. These levels are:
1. Physical Level: concerned with securing the site containing
the computer system should be physically secured. The
backup systems should also be physically protected from
access except for authorized users.
2. Human Level: concerned with authorization of database
users for access the content at different levels and privileges.
3. Operating System: concerned with the weakness and
strength of the operating system security on data files.
Weakness may serve as a means of unauthorized access to
the database. This also includes protection of data in primary
and secondary memory from unauthorized access.
4. Database System: concerned with data access limit enforced
by the database system. Access limit like password, isolated
transaction and etc.
Even though we can have different levels of security and
authorization on data objects and users, who access which
data is a policy matter rather than technical.

These policies
 should be known by the system: should be encoded in the
system
 should be remembered: should be saved somewhere (the
catalogue)

Compiled By: Wondwossen Mulugeta, Faculty of Informatics, AAU 123

 An organization needs to identify the types of threat it may be
subjected to and initiate appropriate plans and
countermeasures, bearing in mind the costs of implementing
them

Compiled By: Wondwossen Mulugeta, Faculty of Informatics, AAU 124

Countermeasures: Computer based
controls
 The types of countermeasure to threats on computer systems
range from physical controls to administrative procedures
 Despite the range of computer-based controls that are available,
it is worth noting that, generally, the security of a DBMS is only
as good as that of the operating system, owing to their close
association
 The following are computer-based security controls for a
multi-user environment:
 Authorization
 The granting of a right or privilege that enables a subject
to have legitimate access to a system or a system’s object
 Authorization controls can be built into the software, and
govern not only what system or object a specified user can
access, but also what the user may do with it
 Authorization controls are sometimes referred to as
access controls
 The process of authorization involves authentication of
subjects (i.e. a user or program) requesting access to
objects (i.e. a database table, view, procedure, trigger, or
any other object that can be created within the system)

 Views
 A view is the dynamic result of one or more relational
operations operation on the base relations to produce
another relation
 A view is a virtual relation that does not actually exist in
the database, but is produced upon request by a particular
user
 The view mechanism provides a powerful and flexible
security mechanism by hiding parts of the database from
certain users
 Using a view is more restrictive than simply having certain
privileges granted to a user on the base relation(s)
 Integrity
 Integrity constraints contribute to maintaining a secure
database system by preventing data from becoming invalid
and hence giving misleading or incorrect results
 Domain Integrity
 Entity integrity
 Referential integrity
 Key constraints
Compiled By: Wondwossen Mulugeta, Faculty of Informatics, AAU 125
 Backup and recovery
 Backup is the process of periodically taking a copy of
the database and log file (and possibly programs) on
to offline storage media
 A DBMS should provide backup facilities to assist with
the recovery of a database following failure
 Database recovery is the process of restoring the
database to a correct state in the event of a failure
 Journaling is the process of keeping and maintaining a
log file (or journal) of all changes made to the
database to enable recovery to be undertaken
effectively in the event of a failure
 The advantage of journaling is that, in the event of a
failure, the database can be recovered to its last
known consistent state using a backup copy of the
database and the information contained in the log file
 If no journaling is enabled on a failed system, the only
means of recovery is to restore the database using the
latest backup version of the database
 However, without a log file, any changes made after
the last backup to the database will be lost

 Encryption
 The encoding of the data by a special algorithm that
renders the data unreadable by any program without
the decryption key
 If a database system holds particularly sensitive data,
it may be deemed necessary to encode it as a
precaution against possible external threats or
attempts to access it
 The DBMS can access data after decoding it, although
there is a degradation in performance because of the
time taken to decode it
 Encryption also protects data transmitted over
communication lines
 To transmit data securely over insecure networks
requires the use of a Cryptosystem, which includes:

Compiled By: Wondwossen Mulugeta, Faculty of Informatics, AAU 126

Authentication
 All users of the database will have different access levels
and permission for different data objects, and
authentication is the process of checking whether the user
is the one with the privilege for the access level.
 Is the process of checking the users are who they say they
are.
 Each user is given a unique identifier, which is used by the
operating system to determine who they are
 Thus the system will check whether the user with a specific
username and password is trying to use the resource.
 Associated with each identifier is a password, chosen by the
user and known to the operation system, which must be
supplied to enable the operating system to authenticate
who the user claims to be

Any database access request will have the following three major
components
[Link] Operation: what kind of operation is
requested by a specific query?
[Link] Object: on which resource or data of
the database is the operation sought to be applied?
[Link] User: who is the user requesting the
operation on the specified object?
The database should be able to check for all the three
components before processing any request. The checking is
performed by the security subsystem of the DBMS.

Compiled By: Wondwossen Mulugeta, Faculty of Informatics, AAU 127

Forms of user authorization
There are different forms of user authorization on the resource of the
database. These forms are privileges on what operations are allowed
on a specific data object.

User authorization on the data/extension

1. Read Authorization: the user with this privilege is allowed
only to read the content of the data object.

2. Insert Authorization: the user with this privilege is allowed

only to insert new records or items to the data object.

3. Update Authorization: users with this privilege are allowed to

modify content of attributes but are not authorized to delete the
records.

4. Delete Authorization: users with this privilege are only

allowed to delete a record and not anything else.
 Different users, depending on the power of the user, can
have one or the combination of the above forms of authorization on
different data objects.

Role of DBA in Database Security

The database administrator is responsible to make the database to be
as secure as possible. For this the DBA should have the most powerful
privilege than every other user. The DBA provides capability for
database users while accessing the content of the database.
The major responsibilities of DBA in relation to authorization of users
are:
1. Account Creation: involves creating different accounts for
different USERS as well as USER GROUPS.

2. Security Level Assignment: involves in assigning different

users at different categories of access levels.

3. Privilege Grant: involves giving different levels of privileges for

different users and user groups.

4. Privilege Revocation: involves denying or canceling previously

granted privileges for users due to various reasons.

Compiled By: Wondwossen Mulugeta, Faculty of Informatics, AAU 128

5. Account Deletion: involves in deleting an existing account of
users or user groups. Is similar with denying all privileges of users
on the database.

Compiled By: Wondwossen Mulugeta, Faculty of Informatics, AAU 129

2. Distributed Database
Systems
 Database development facilitates the integration of data
available in an organization and enforces security on data
access. But it is not always the case that organizational data
reside in one site. This demand databases at different sites to be
integrated and synchronized with all the facilities of database
approach. This leads to Distributed Database Systems.
 In a distributed database system, the database is stored on
several computers. The computers in a distributed system
communicate with each other through various communication
media, such as high speed buses or telephone line.
 A distributed database system consists of a collection of sites,
each of which maintains a local database system and also
participates in global transaction where different databases are
integrated together.
 Even though integration of data implies centralized storage and
control, in distributed database systems the intention is different.
Data is stored in different database systems in a decentralized
manner but act as if they are centralized through development of
computer networks.
 A distributed database system consists of loosely coupled sites
that share no physical component and database systems that
run on each site are independent of each other.
 Transactions may access data at one or more sites
 Organization may implement their database system on a number
of separate computer system rather than a single, centralized
mainframe. Computer Systems may be located at each local
branch office.

The functionalities of a DDBMS will include: Extended Communication

Services, Extended Data Dictionary, Distributed Query Processing, Extended
Concurrency Control and Extended Recovery Services.

Concepts in DDBMS
 Replication: System maintains multiple copies of data,
stored in different sites, for faster retrieval and fault tolerance.
 Fragmentation: Relation is partitioned into several
fragments stored in distinct sites

Compiled By: Wondwossen Mulugeta, Faculty of Informatics, AAU 130

 Data transparency: Degree to which system user may
remain unaware of the details of how and where the data items
are stored in a distributed system

Compiled By: Wondwossen Mulugeta, Faculty of Informatics, AAU 131

Advantages of DDBMS
1. Data sharing and distributed control:
 User at one site may be able access data that is available at
another site.
 Each site can retain some degree of control over local data
 We will have local as well as global database administrator

2. Reliability and availability of data

 If one site fails the rest can continue operation as long as
transaction does not demand data from the failed system and the
data is not replicated in other sites

3. Speedup of query processing

 If a query involves data from several sites, it may be possible to
split the query into sub-queries that can be executed at several
sites which is parallel processing

Disadvantages of DDBMS
1. Software development cost
2. Greater potential for bugs (parallel processing
may endanger correctness)
3. Increased processing overhead (due to
communication jargons)
4. Communication problems

Homogeneous and Heterogeneous Distributed

Databases

 In a homogeneous distributed database

 All sites have identical software
 Are aware of each other and agree to cooperate in
processing user requests.
 Each site surrenders part of its autonomy in terms of right
to change schemas or software
 Appears to user as a single system
 In a heterogeneous distributed database
 Different sites may use different schemas and software
 Difference in schema is a major problem for query
processing
 Difference in software is a major problem for
transaction processing
 Sites may not be aware of each other and may provide
only limited facilities for cooperation in transaction
processing
Compiled By: Wondwossen Mulugeta, Faculty of Informatics, AAU 132
Compiled By: Wondwossen Mulugeta, Faculty of Informatics, AAU 133
3. Data warehousing

 Data warehouse is an integrated, subject-oriented,

time-variant, non-volatile database that provides
support for decision making.

 Integrated à centralized, consolidated database that

integrates data derived from the entire organization.

 Consolidates data from multiple and diverse sources

with diverse formats.
 Helps managers to better understand the company’s
operations.
 Subject-Oriented à Data warehouse contains data
organized by topics. Eg. Sales, marketing, finance, etc.

 Time variant: In contrast to the operational data that

focus on current transactions, the warehouse data
represent the flow of data through time.
 Data warehouse contains data that reflect what
happened last week, last month, past five years, and
so on.
 Non volatile à Once data enter the data warehouse,
they are never removed. Because the data in the
warehouse represent the company’s entire history.

Differences between database and data

warehouse
 Because data is added all the time, warehouse is growing.
 The data warehouse and operational environments are
separated. Data warehouse receives its data from
operational databases.
 Data warehouse environment is characterized by read-only
transactions to very large data sets.
 Operational environment is characterized by numerous
update transactions to a few data entities at a time.
 Data warehouse contains historical data over a long time
horizon.
 Ultimately Information is created from data warehouses. Such
Information becomes the basis for rational decision making.

Compiled By: Wondwossen Mulugeta, Faculty of Informatics, AAU 134

 The data found in data warehouse is analyzed to discover
previously unknown data characteristics, relationships,
dependencies, or trends.

Compiled By: Wondwossen Mulugeta, Faculty of Informatics, AAU 135

Introduction to Database Management Systems
No ratings yet
Introduction to Database Management Systems
33 pages
Database Systems Fundamentals Explained
No ratings yet
Database Systems Fundamentals Explained
117 pages
Introduction to Database Systems
No ratings yet
Introduction to Database Systems
103 pages
Database Management Systems Overview
No ratings yet
Database Management Systems Overview
85 pages
Introduction to Database Management Systems
No ratings yet
Introduction to Database Management Systems
19 pages
Introduction to Database Systems
No ratings yet
Introduction to Database Systems
57 pages
Introduction to Database Systems
No ratings yet
Introduction to Database Systems
58 pages
Introduction to Database Systems
No ratings yet
Introduction to Database Systems
17 pages
Database Systems Fundamentals Explained
No ratings yet
Database Systems Fundamentals Explained
74 pages
Overview of Database Systems
No ratings yet
Overview of Database Systems
53 pages
Database Management Systems Overview
No ratings yet
Database Management Systems Overview
229 pages
Introduction to Database Systems Overview
No ratings yet
Introduction to Database Systems Overview
12 pages
Database Systems: Overview and Benefits
No ratings yet
Database Systems: Overview and Benefits
43 pages
Introduction to Database Systems
No ratings yet
Introduction to Database Systems
16 pages
Database Systems Fundamentals Overview
No ratings yet
Database Systems Fundamentals Overview
21 pages
Database Systems Fundamentals Overview
100% (10)
Database Systems Fundamentals Overview
50 pages
Database Systems Lecture Notes
No ratings yet
Database Systems Lecture Notes
99 pages
DBMS Overview in Software Engineering
No ratings yet
DBMS Overview in Software Engineering
26 pages
Database Administration Fundamentals
No ratings yet
Database Administration Fundamentals
115 pages
Overview of Database Systems and Approaches
No ratings yet
Overview of Database Systems and Approaches
25 pages
Database System Fundamentals Overview
No ratings yet
Database System Fundamentals Overview
12 pages
Understanding Database Systems and Approaches
No ratings yet
Understanding Database Systems and Approaches
78 pages
DB System Lecture Note All in One-1-13
No ratings yet
DB System Lecture Note All in One-1-13
13 pages
Database Normalization and Anomalies Analysis
No ratings yet
Database Normalization and Anomalies Analysis
86 pages
Introduction to Database Systems
No ratings yet
Introduction to Database Systems
30 pages
Database Administration Fundamentals
100% (1)
Database Administration Fundamentals
69 pages
Database Normalization and Anomalies Analysis
No ratings yet
Database Normalization and Anomalies Analysis
87 pages
Introduction to Database Systems Overview
No ratings yet
Introduction to Database Systems Overview
71 pages
Database System Fundamentals Overview
No ratings yet
Database System Fundamentals Overview
20 pages
Database Systems Fundamentals Overview
No ratings yet
Database Systems Fundamentals Overview
65 pages
Introduction to Database Systems
No ratings yet
Introduction to Database Systems
10 pages
Introduction to Database Concepts
No ratings yet
Introduction to Database Concepts
24 pages
Database Management System Overview
No ratings yet
Database Management System Overview
25 pages
Database Systems Fundamentals Explained
No ratings yet
Database Systems Fundamentals Explained
137 pages
Introduction to Database Systems
No ratings yet
Introduction to Database Systems
68 pages
DBMS Overview and Key Concepts
No ratings yet
DBMS Overview and Key Concepts
38 pages
Database System Overview and Management
No ratings yet
Database System Overview and Management
39 pages
Database Normalization and Anomalies
No ratings yet
Database Normalization and Anomalies
105 pages
Understanding Database Users and Systems
No ratings yet
Understanding Database Users and Systems
36 pages
Introduction to Database Systems
No ratings yet
Introduction to Database Systems
25 pages
Introduction to Database Systems Overview
No ratings yet
Introduction to Database Systems Overview
47 pages
Database Engineering Course Syllabus
No ratings yet
Database Engineering Course Syllabus
202 pages
Database System Fundamentals Explained
No ratings yet
Database System Fundamentals Explained
90 pages
Understanding Database Management Systems
No ratings yet
Understanding Database Management Systems
33 pages
Database System Fundamentals Explained
No ratings yet
Database System Fundamentals Explained
7 pages
Database Design and Management Guide
No ratings yet
Database Design and Management Guide
54 pages
Database Environment and File Systems Overview
No ratings yet
Database Environment and File Systems Overview
28 pages
Database Systems Course Outline and Concepts
No ratings yet
Database Systems Course Outline and Concepts
36 pages
Introduction to Database Systems
No ratings yet
Introduction to Database Systems
9 pages
Chapter One
No ratings yet
Chapter One
18 pages
Understanding Database Systems Basics
No ratings yet
Understanding Database Systems Basics
68 pages
Introduction to Data and DBMS Concepts
No ratings yet
Introduction to Data and DBMS Concepts
8 pages
Introduction to Data and DBMS Concepts
No ratings yet
Introduction to Data and DBMS Concepts
30 pages
Data Management Basics and DBMS Overview
No ratings yet
Data Management Basics and DBMS Overview
102 pages
Database Concepts Overview
No ratings yet
Database Concepts Overview
40 pages
Introduction to Database Management Systems
No ratings yet
Introduction to Database Management Systems
24 pages
Database System Fundamentals Overview
No ratings yet
Database System Fundamentals Overview
9 pages
Simplified Database Design Guide
No ratings yet
Simplified Database Design Guide
238 pages
Database Management Approaches Explained
No ratings yet
Database Management Approaches Explained
16 pages
Understanding Relational Data Model Concepts
No ratings yet
Understanding Relational Data Model Concepts
17 pages
Relational Model Overview and Concepts
No ratings yet
Relational Model Overview and Concepts
48 pages
Relational Data Model Overview
No ratings yet
Relational Data Model Overview
41 pages
Key Concepts of Relational Databases
No ratings yet
Key Concepts of Relational Databases
11 pages
SQL and DBMS Concepts Overview
100% (2)
SQL and DBMS Concepts Overview
68 pages
CS101 Final Term Papers Collection
No ratings yet
CS101 Final Term Papers Collection
27 pages
Understanding Relational Model in DBMS
No ratings yet
Understanding Relational Model in DBMS
74 pages
Database Systems Overview IN3020&4020
No ratings yet
Database Systems Overview IN3020&4020
769 pages
Transaction Processing in Databases
No ratings yet
Transaction Processing in Databases
399 pages
Relational Model Concepts in DBMS
No ratings yet
Relational Model Concepts in DBMS
1 page
Understanding SQL and MySQL Basics
No ratings yet
Understanding SQL and MySQL Basics
4 pages
Relational Model: Domains and Constraints
No ratings yet
Relational Model: Domains and Constraints
27 pages
Understanding Relational Database Instances
No ratings yet
Understanding Relational Database Instances
84 pages
Relational Model and SQL Basics
No ratings yet
Relational Model and SQL Basics
166 pages
SS2 Data Processing: Key Concepts
No ratings yet
SS2 Data Processing: Key Concepts
17 pages
Understanding the Relational Data Model
No ratings yet
Understanding the Relational Data Model
136 pages
Functions of DBMS Explained
No ratings yet
Functions of DBMS Explained
32 pages
SQL Indexes and Relational Model Overview
No ratings yet
SQL Indexes and Relational Model Overview
47 pages
DDM Database: Relational Model Overview
No ratings yet
DDM Database: Relational Model Overview
23 pages
Understanding the Relational Model in SQL
No ratings yet
Understanding the Relational Model in SQL
43 pages
ER to Relational Mapping Techniques
No ratings yet
ER to Relational Mapping Techniques
8 pages
Database Schema Decomposition Explained
No ratings yet
Database Schema Decomposition Explained
28 pages
Overview of Database Management Systems
No ratings yet
Overview of Database Management Systems
27 pages
Query Processing and Optimization in DBMS
No ratings yet
Query Processing and Optimization in DBMS
38 pages
Overview of Database Management Systems
No ratings yet
Overview of Database Management Systems
63 pages
CAD Model Manipulation and Data Structure
No ratings yet
CAD Model Manipulation and Data Structure
10 pages
Database Management System Assignment 2025
No ratings yet
Database Management System Assignment 2025
4 pages
DBMS Architectures and Relational Model
No ratings yet
DBMS Architectures and Relational Model
33 pages
Relational Algebra and Calculus Overview
No ratings yet
Relational Algebra and Calculus Overview
84 pages
Understanding the Relational Model
No ratings yet
Understanding the Relational Model
63 pages