Database System Part I
Database System Part I
Database System
Database systems are designed to manage large data set in an organization. The
data management involves both definition and the manipulation of the data
which ranges from simple representation of the data to considerations of
structures for the storage of information. The data management also consider the
provision of mechanisms for the manipulation of information.
Today, Databases are essential to every business. They are used to maintain
internal records, to present data to customers and clients on the World-Wide-
Web, and to support many other commercial processes. Databases are likewise
found at the core of many modern organizations.
The power of databases comes from a body of knowledge and technology that
has developed over several decades and is embodied in specialized software
called a database management system, or DBMS. A DBMS is a powerful tool for
creating and managing large amounts of data efficiently and allowing it to
persist over long periods of time, safely. These systems are among the most
complex types of software available.
Data management passes through the different levels of development along with
the development in technology and services. These levels could best be described
by categorizing the levels into three levels of development. Even though there is
an advantage and a problem overcome at each new level, all methods of data
handling are in use to some extent. The major three levels are;
1. Manual Approach
2. Traditional File Based Approach
3. Database Approach
1. Manual Approach
In the manual approach, data storage and retrieval follows the primitive and
traditional way of information handling where cards and paper are used for the
purpose. The data storage and retrieval will be performed using human labour.
¾ Files for as many event and objects as the organization has are used to
store information.
¾ Each of the files containing various kinds of information is labelled and
stored in one ore more cabinets.
¾ The cabinets could be kept in safe places for security purpose based on the
sensitivity of the information contained in it.
¾ Insertion and retrieval is done by searching first for the right cabinet then
for the right the file then the information.
¾ One could have an indexing system to facilitate access to the data
3. Database Approach
Following a famous paper written by Ted Codd in 1970, database systems
changed significantly. Codd proposed that database systems should present the
user with a view of data organized as tables called relations. Behind the scenes,
there might be a complex data structure that allowed rapid response to a variety
of queries. But, unlike the user of earlier database systems, the user of a relational
system would not be concerned with the storage structure. Queries could be
expressed in a very high-level language, which greatly increased the efficiency of
database programmers. The database approach emphasizes the integration and
sharing of data throughout the organization.
¾ Data Dictionary:
o Due to the fact that a database is a self describing system, this tool,
Data Dictionary, is used to store and organize information about
the data stored in the database.
The DBMS is software package that helps to design, manage, and use data using
the database approach. Taking a DBMS as a system, one can describe it with
respect to it environment or other systems interacting with the DBMS. The DBMS
environment has five components. To design and use a database, there will be
the interaction or integration of Hardware, Software, Data, Procedure and
People.
1. Hardware: are components that one can touch and feel. These
components are comprised of various types of personal computers,
mainframe or any server computers to be used in multi-user system,
network infrastructure, and other peripherals required in the system.
3. Data: since the goal of any database system is to have better control of
the data and making data useful, Data is the most important component to
the user of the database. There are two categories of data in any database
system: that is Operational and Metadata. Operational data is the data
actually stored in the system to be used by the user. Metadata is the data
that is used to store information about the database itself.
The structure of the data in the database is called the schema, which is
composed of the Entities, Properties of entities, and relationship between
entities.
4. Procedure: this is the rules and regulations on how to design and use a
database. It includes procedures like how to log on to the DBMS, how to
use facilities, how to start and stop transaction, how to make backup, how
to treat hardware and software failure, how to change the structure of the
database.
2. Physical DBD
¾ Take logical design specification as input and decide how it
should be physically realized.
¾ Map the logical data model on the specified DBMS with respect
to tables and integrity constraints. (DBMS dependent designing)
¾ Select specific storage structure and access path to the database
¾ Design security measures required on the database
4. End Users
Workers, whose job requires accessing the database frequently for various
purpose. There are different group of users in this category.
1. Naïve Users:
¾ Sizable proportion of users
¾ Unaware of the DBMS
¾ Only access the database based on their access level and
demand
¾ Use standard and pre-specified types of queries.
2. Sophisticated Users
¾ Are users familiar with the structure of the Database and
facilities of the DBMS.
¾ Have complex requirements
¾ Have higher level queries
¾ Are most of the time engineers, scientists, business analysts, etc
3. Casual Users
¾ Users who access the database occasionally.
¾ Need different information from the database each time.
¾ Use sophisticated database queries to satisfy their needs.
¾ Are most of the time middle to high level managers.
These users can be again classified as “Actors on the Scene” and “Workers
Behind the Scene”.
ANSI-SPARC Architecture
The purpose and origin of the Three-Level database
architecture
8 All users should be able to access same data. This is important since
the database is having a shared data feature where all the data is
stored in one location and all users will have their own customized
way of interacting with the data.
8 A user's view is unaffected or immune to changes made in other
views. Since the requirement of one user is independent of the other, a
change made in one user’s view should not affect other users.
8 Users should not need to know physical database storage details. As
there are naïve users of the system, hardware level or physical details
should be a black-box for such users.
8 DBA should be able to change database storage structures without
affecting the users' views. A change in file organization, access method
should not affect the structure of the data which in turn will have no
effect on the users.
8 Internal structure of database should be unaffected by changes to
physical aspects of storage.
8 DBA should be able to change conceptual structure of database
without affecting all users. In any database system, the DBA will have
the privilege to change the structure of the database, like adding tables,
adding and deleting an attribute, changing the specification of the
objects in the database.
All the above and many other functionalities are possible due to the
three level ANSI-SPARC architecture.
External Level: Users' view of the database. Describes that part of database
that is relevant to a particular user. Different users have their own
customized view of the database independent of other users.
The following example can be taken as an illustration for the difference between
the three levels in the ANSI-SPARC database Architecture. Where:
• The first level is concerned about the group of users and their
respective data requirement independent of the other.
• The second level is describing the whole content of the database
where one piece of information will be represented once.
• The third level
External schema: at the external level to describe the various user views. Usually
uses the same data model as the conceptual level.
Data Independence
Logical Data Independence:
8 Refers to immunity of external schemas to changes in conceptual
schema.
8 Conceptual schema changes e.g. addition/removal of entities
should not require changes to external schema or rewrites of
application programs.
8 The capacity to change the conceptual schema without having to
change the external schemas and their application programs.
Database Languages
Data Definition Language (DDL)
8 Allows DBA or user to describe and name entitles, attributes and
relationships required for the application.
8 Specification notation for defining the database schema
1. Hierarchical Model
• The simplest data model
• Record type is referred to as node or segment
• The top node is the root node
• Nodes are arranged in a hierarchical structure as sort of upside-
down tree
• A parent node can have more than one child node
• A child node can only have one parent node
• The relationship between parent and child is one-to-many
• Relation is established by creating physical link between stored
records (each is stored with a predefined access path to other
records)
• To add new record type or relationship, the database must be
redefined and then stored in a new form.
Department
Employee Job
2. Network Model
• Allows record types to have more that one parent unlike
hierarchical model
• A network data models sees records as set members
• Each set has an owner and one or more members
• Allow no many to many relationship between entities
• Like hierarchical model network model is a collection of physically
linked records.
• Allow member records to have more than one owner
Department Job
Employee
Activity
Time Card
Alternative terminologies
Relation Table File
Tuple Row Record
Attribute Column Field
All values in a column represent the same attribute and have the same
data format
1. The ENTITIES (persons, places, things etc.) which the organization has to
deal with. Relations can also describe relationships
Every relation has a schema, which describes the columns, or fields the
relation itself corresponds to our familiar notion of a table:
A relation is a collection of tuples, each of which contains values for a
fixed number of attributes
Existence Dependency: the dependence of an entity on the existence
of one or more entities.
Weak entity : an entity that can not exist without the entity with
which it has a relationship – it is indicated by a double rectangle
Types of Attributes
4. The RELATIONSHIPS between entities which exist and must be taken into
account when processing information. In any business processing one object
may be associated with another object due to some event. Such kind of
association is what we call a RELATIONSHIP between entity objects.
Degree of a Relationship
• An important point about a relationship is how many entities
participate in it. The number of entities participating in a
relationship is called the DEGREE of the relationship.
Cardinality of a Relationship
• Another important concept about relationship is the number of
instances/tuples that can be associated with a single instance from
one entity in a single relationship. The number of instances
participating or associated with a single instance from an entity in a
relationship is called the CARDINALITY of the relationship. The
major cardinalities of a relationship are:
o ONE-TO-ONE: one tuple is associated with only one other
tuple.
E.g. Building – LocationÆ as a single building will be
located in a single location and as a single location
will only accommodate a single Building.
o ONE-TO-MANY, one tuple can be associated with many
other tuples, but not the reverse.
E.g. Department-Student Æas one department can
have multiple students.
o MANY-TO-ONE, many tuples are associated with one tuple
but not the reverse.
E.g. Employee – Department: as many employees
belong to a single department.
o MANY-TO-MANY: one tuple is associated with many other
tuples and from the other side, with a different role name
one tuple will be associated with many tuples
E.g. Student – CourseÆas a student can take many
courses and a single course can be attended by many
students.
• Relational Integrity
¾ Domain Integrity: No value of the attribute should be
beyond the allowable limits
¾ Entity Integrity: In a base relation, no attribute of a
Primary Key can assume a value of NULL
¾ Referential Integrity: If a Foreign Key exists in a
relation, either the Foreign Key value must match a
Candidate Key value in its home relation or the
Foreign Key value must be NULL
¾ Enterprise Integrity: Additional rules specified by the
users or database administrators of a database are
incorporated
• Key constraints
If tuples are need to be unique in the database, and then we need to make
each tuple distinct. To do this we need to have relational keys that
uniquely identify each relation.
• Relational Views
Relations are perceived as a Table from the users’ perspective. Actually,
there are two kinds of relation in relational database. The two categories or
tyapes of Relations are Named and Unnamed Relations. The basic
difference is on how the relation is created, used and updated:
1. Base Relation
A Named Relation corresponding to an entity in the conceptual
schema, whose tuples are physically stored in the database.
2. View (Unnamed Relation)
A View is the dynamic result of one or more relational operations
operating on the base relations to produce another virtual relation
that does not actually exist as presented. So a view is virtually
derived relation that does not necessarily exist in the database but
can be produced upon request by a particular user at the time of
request. The virtual table or relation can be created from single or
different relations by extracting some attributes and records with or
without conditions.
Purpose of a view
¾ Hides unnecessary information from users: since only part of
the base relation (Some collection of attributes, not necessarily
all) are to be included in the virtual table.
¾ Provide powerful flexibility and security: since unnecessary
information will be hidden from the user there will be some
sort of data security.
¾ Provide customized view of the database for users: each users
are going to be interfaced with their own preferred data set
and format by making use of the Views.
¾ A view of one base relation can be updated.
¾ Update on views derived from various relations is not
allowed since it may violate the integrity of the database.
¾ Update on view with aggregation and summary is not
allowed. Since aggregation and summary results are
computed from a base relation and does not exist actually.
When a database is designed using a Relational data model, all the data is
represented in a form of a table. In such definitions and representation, there are
two basic components of the database. The two components are the definition of
the Relation or the Table and the actual data stored in each table. The data
definition is what we call the Schema or the skeleton of the database and the
Relations with some information at some point in time is the Instance or the flesh
of the database.
Schemas
Schema describes how data is to be structured, defined at setup/Design
time (also called "metadata")
Since it is used during the database development phase, there is rare
tendency of changing the schema unless there is a need for system
maintenance which demands change to the definition of a relation.
z Schema Diagrams
¾ convention to display some aspect of a schema visually
z Schema Construct
¾ refers to each object in the schema (e.g. STUDENT)
E.g.: STUNEDT (FName,LName,Id,Year,Dept,Sex)
Instances
Instance: is the collection of data in the database at a particular point of
time (snap-shot).
¾ Also called State or Snap Shot or Extension of the database
¾ Refers to the actual data in the database at a specific point in time
¾ State of database is changed any time we add, delete or update an
item.
¾ Valid state: the state that satisfies the structure and constraints
specified in the schema and is enforced by DBMS
Database Design
Database design is the process of coming up with different kinds of
specification for the data to be stored in the database. The database design
part is one of the middle phases we have in information systems
development where the system uses a database approach. Design is the
part on which we would be engaged to describe how the data should be
perceived at different levels and finally how it is going to be stored in a
computer system.
From these different phases, the prime interest of a database system will be
the Design part which is again sub divided into other three sub-phases.
These sub-phases are:
1. Conceptual Design
2. Logical Design, and
3. Physical Design
¾ In general, one has to go back and forth between these tasks to refine
a database design, and decisions in one task can influence the
choices in another task.
¾ In developing a good design, one should answer such questions as:
What are the relevant Entities for the Organization
What are the important features of each Entity
What are the important Relationships
What are the important queries from the user
What are the other requirements of the Organization
and the Users
Logical Design
Physical Design
The basic E-R model is graphically depicted and presented for review.
The process is repeated until the end users and designers agree that the E-
R diagram is a fair representation of the organization’s activities and
functions.
Checking for Redundant Relationships in the ER Diagram. Relationships
between entities indicate access from one entity to another - it is therefore
possible to access one entity occurrence from another entity occurrence
even if there are other entities and relationships that separate them - this is
often referred to as Navigation' of the ER diagram
The last phase in ER modeling is validating an ER Model against
requirement of the user.
Key
Diamond Diamond
Id Gpa
Students Course
Age
Enrolled_In Semester
Academic
Year
Grade
One-to-one relationship:
¾ A customer is associated with at most one loan via the relationship borrower
¾ A loan is associated with at most one customer via borrower
1..1 Manages
0..1
Employee Branch
One-To-Many Relationships
¾ In the one-to-many relationship a loan is associated with at most one customer
via borrower, a customer is associated with several (including 0) loans via
borrower
1 1 Leads
0..*
Employee Project
Many-To-Many Relationship
¾ A customer is associated with several (possibly 0) loans via borrower
¾ A loan is associated with several (possibly 0) customers via borrower
0..* Teaches
1..*
Instructor Course
Problem in ER Modeling
The Entity-Relationship Model is a conceptual data model that views the real
world as consisting of entities and relationships. The model visually represents
these concepts by the Entity-Relationship diagram. The basic constructs of the ER
model are entities, relationships, and attributes. Entities are concepts, real or
abstract, about which information is collected. Relationships are associations
between the entities. Attributes are properties which describe the entities.
While designing the ER model one could face a problem on the design which is
called a connection traps. Connection traps are problems arising from
misinterpreting certain relationships
Example:
Problem: Which car (Car1 or Car3 or Car5) is used by Employee 6 Emp6 working
in Branch 1 (Bra1)? Thus from this ER Model one can not tell which car is used by
which staff since a branch can have more than one car and also a branch is
populated by more than one employee. Thus we need to restructure the model to
avoid the connection trap.
To avoid the Fan Trap problem we can go for restructuring of the E-R Model.
This will result in the following E-R Model.
Car1
Bra1 Emp1
Car2
Bra2 Emp2
Car3
Bra3 Emp3
Car4
Bra4 Emp4
Car5
Emp5
Car6
Emp6
Car7
Emp7
2. Chasm Trap:
Occurs where a model suggests the existence of a relationship between
entity types, but the path way does not exist between certain entity
occurrences.
May exist when there are one or more relationships with a minimum
multiplicity on cardinality of zero forming part of the pathway between
related entities.
Example:
If we have a set of projects that are not active currently then we can not
assign a project manager for these projects. So there are project with no
project manager making the participation to have a minimum value of
zero.
Problem:
How can we identify which BRANCH is responsible for which PROJECT?
We know that whether the PROJECT is active or not there is a responsible
BRANCH. But which branch is a question to be answered, and since we
have a minimum participation of zero between employee and PROJECT
we can’t identify the BRANCH responsible for each PROJECT.
The solution for this Chasm Trap problem is to add another relation ship
between the extreme entities (BRANCH and PROJECT)
EER Concepts
Generalization
Specialization
Sub classes
Super classes
Attribute Inheritance
Constraints on specialization and generalization
Generalization
¾ Generalization occurs when two or more entities represent categories
of the same real-world object.
¾ Generalization is the process of defining a more general entity type
from a set of more specialized entity types.
¾ A generalization hierarchy is a form of abstraction that specifies that
two or more entities that share common attributes can be generalized
into a higher level entity type.
¾ Is considered as bottom-up definition of entities.
¾ Generalization hierarchy depicts relationship between higher level
superclass and lower level subclass.
Generalization hierarchies can be nested. That is, a subtype of one
hierarchy can be a supertype of another. The level of nesting is limited
only by the constraint of simplicity.
Example: Account is a generalized form for Saving and Current
Accounts
Specialization
¾ Is the result of subset of a higher level entity set to form a lower level
entity set.
¾ The specialized entities will have additional set of attributes
(distinguishing characteristics) that distinguish them from the
generalized entity.
¾ Is considered as Top-Down definition of entities.
¾ Specialization process is the inverse of the Generalization process.
Identify the distinguishing features of some entity occurrences, and
specialize them into different subclasses.
¾ Reasons for Specialization
o Attributes only partially applying to superclasses
o Relationship types only partially applicable to the superclass
¾ In many cases, an entity type has numerous sub-groupings of its
entities that are meaningful and need to be represented explicitly. This
need requires the representation of each subgroup in the ER model.
The generalized entity is a superclass and the set of specialized entities
will be subclasses for that specific Superclass.
o Example: Saving Accounts and Current Accounts are
Specialized entities for the generalized entity Accounts.
Manager, Sales, Secretary: are specialized employees.
Subclass/Subtype
¾ An entity type whose tuples have attributes that distinguish its
members from tuples of the generalized or Superclass entities.
¾ When one generalized Superclass has various subgroups with
distinguishing features and these subgroups are represented by
specialized form, the groups are called subclasses.
¾ Subclasses can be either mutually exclusive (disjoint) or overlapping
(inclusive).
¾ A single subclass may inherit attributes from two distinct superclasses.
¾ A mutually exclusive category/subclass is when an entity instance can
be in only one of the subclasses.
E.g.: An EMPLOYEE can either be SALARIED or PART-TIMER but
not both.
¾ An overlapping category/subclass is when an entity instance may be
in two or more subclasses.
E.g.: A PERSON who works for a university can be both
EMPLOYEE and a STUDENT at the same time.
Superclass /Supertype
¾ An entity type whose tuples share common attributes. Attributes that
are shared by all entity occurrences (including the identifier) are
associated with the supertype.
¾ Is the generalized entity
Attribute Inheritance
¾ An entity that is a member of a subclass inherits all the
attributes of the entity as a member of the superclass.
¾ The entity also inherits all the relationships in which the
superclass participates.
¾ An entity may have more than one subclass categories.
¾ All entities/subclasses of a generalized entity or superclass
share a common unique identifier attribute (primary key). i.e.
The primary key of the superclass and subclasses are always
identical.
• The Partial Specialization Rule specifies that it is not necessary for all
entity occurrences in the superclass to be a member of one of the
subclasses. Here we have an optional participation on the specialization.
Partial Participation of superclass instances on subclasses is diagrammed
with a single line from the Supertype to the circle.
Disjointness Constraints.
• Specifies the rule whether one entity occurrence can be a member of
more than one subclasses. i.e. it is a type of business rule that deals
with the situation where an entity occurrence of a Superclass may
also have more than one Subclass occurrence.
• The Disjoint Rule restricts one entity occurrence of a superclass to
be a member of only one of the subclasses. Example: a EMPLOYEE
can either be SALARIED or PART-TIMER, but not the both at the
same time.
• The Overlap Rule allows one entity occurrence to be a member f
more than one subclass. Example: EMPLOYEE working at the
university can be both a STUDENT and an EMPLOYEE at the same
time.
• This is diagrammed by placing either the letter "d" for disjoint or "o"
for overlapping inside the circle on the Generalization Hierarchy
portion of the E-R diagram.
From the two types of constraints we can have four possible constraints