DBMS Unit-1-1
DBMS Unit-1-1
Unit-1
Introduction to Database Systems:
As the name suggests, the database management system consists of two parts. They are:
1. Database and
2. Management System
Data: Any fact that can be recorded or stored. Facts, figures, statistics etc. having no particular
meaning (e.g. 1, ABC, 19 etc).
Database: Collection of data.
We now have a collection of 4 tables. They can be called a “related collection” because we can
clearly find out that there are some common attributes existing in a selected pair of tables. Because
of these common attributes we may combine the data of two or more tables together to find out
the complete details of a student. Questions like “Which hostel does the youngest student live in?”
can be answered now, although, Age and Hostel attributes are in different tables.
File System:
The file system is basically a way of arranging the files in a storage medium like a hard disk. The
file system organizes the files and helps in the retrieval of files when they are required. File systems
consist of different files which are grouped into directories. The directories further contain other
folders and files. The file system performs basic operations like management, file naming, giving
access rules, etc.
DBMS (Database Management System):
Database Management System is basically software that manages the collection of related data. It
is used for storing data and retrieving the data effectively when it is needed. It also provides proper
security measures for protecting the data from unauthorized access. In Database Management
System the data can be fetched by SQL queries and relational algebra. It also provides mechanisms
for data recovery and data backup.
Advantages of DBMS:
• Data independence: Application programs should be as independent as possible from
details of data representation and storage. The DBMS can provide an abstract view of the
data to insulate application code from such details.
• Efficient data access: A DBMS utilizes a variety of sophisticated techniques to store and
retrieve data efficiently. This feature is especially important if the data is stored on external
storage devices.
• Data integrity and security: If data is always accessed through the DBMS, the DBMS
can enforce integrity constraints on the data. For example, before inserting salary
information for an employee, the DBMS can check that the department budget is not
exceeded. Also, the DBMS can enforce access controls that govern what data is visible to
different classes of users.
• Data administration: When several users share the data, centralizing the administration
of data can offer significant improvements. Experienced professionals who understand the
nature of the data being managed, and how different groups of users use it, can be
responsible for organizing the data representation to minimize redundancy and for fine-
tuning the storage of the data to make retrieval efficient.
• Concurrent access and crash recovery: A DBMS schedules concurrent accesses to the
data in such a manner that users can think of the data as being accessed by only one user
at a time. Further, the DBMS protects users from the aspects of system failures.
• Reduced application development time: Clearly, the DBMS supports many important
functions that are common to many applications accessing data stored in the DBMS. This,
in conjunction with the high-level interface to the data, facilitates quick development of
applications. Such applications are also likely to be more robust than applications
developed from scratch because many important tasks are handled by the DBMS instead
of being implemented by the application.
Disadvantages of DBMS:
• It is bit complex. Since it supports multiple functionality to give the user the best, the
underlying software has become complex. The designers and developers should have
thorough knowledge about the software to get the most out of it.
• Because of its complexity and functionality, it uses large amount of memory. It also needs
large memory to run efficiently.
• DBMS system works on the centralized system, i.e.; all the users from all over the world
access this database. Hence any failure of the DBMS, will impact all the users.
• DBMS is generalized software, i.e.; it is written work on the entire systems rather specific
one. Hence some of the application will run slow.
Structure of DBMS:
A database system is partitioned into modules that deal with each of the responsibilities of the
overall system. The functional components of a database system can be broadly divided into the
storage manager and the query processor components. The storage manager is important
because databases typically require a large amount of storage space. The query processor is
important because it helps the database system simplify and facilitate access to data.
4
Query Processor:
The query processor components include,
• DDL interpreter, which interprets DDL statements and records the definitions in the data
dictionary.
• DML compiler, which translates DML statements in a query language into an evaluation
plan consisting of low-level instructions that the query evaluation engine understands.
A query can usually be translated into any of a number of alternative evaluation plans that all
give the same result. The DML compiler also performs query optimization, that is, it picks
the lowest cost evaluation plan from among the alternatives.
Query evaluation engine, which executes low-level instructions generated by the DML
compiler.
Storage Manager:
A storage manager is a program module that provides the interface between the lowlevel data
stored in the database and the application programs and queries submitted to the system. The
storage manager is responsible for the interaction with the file manager. The raw data are stored
on the disk using the file system, which is usually provided by a conventional operating system.
The storage manager translates the various DML statements into low-level file-system commands.
Thus, the storage manager is responsible for storing, retrieving, and updating data in the database.
The storage manager components include:
• Authorization and integrity manager, which tests for the satisfaction of integrity
constraints and checks the authority of users to access data.
• Transaction manager, which ensures that the database remains in a consistent (correct)
state despite system failures, and that concurrent transaction executions proceed without
conflicting.
• File manager, which manages the allocation of space on disk storage and the data
structures used to represent information stored on disk.
• Buffer manager, which is responsible for fetching data from disk storage into main
memory, and deciding what data to cache in main memory. The buffer manager is a critical
part of the database system, since it enables the database to handle data sizes that are much
larger than the size of main memory.
Transaction Manager:
A transaction is a collection of operations that performs a single logical function in a database
application. Each transaction is a unit of both atomicity and consistency. Thus, we require that
transactions do not violate any database-consistency constraints. That is, if the database was
consistent when a transaction started, the database must be consistent when the transaction
successfully terminates. Transaction - manager ensures that the database remains in a consistent
(correct) state despite system failures (e.g., power failures and operating system crashes) and
transaction failures.
Disk Storage: It contains the following components –
• Data Files: It stores the data.
• Data Dictionary: It contains the information about the structure of any database object. It
is the repository of information that governs the metadata.
• Indices: It provides faster retrieval of data item.
• Data Abstraction: For the system to be usable, it must retrieve data efficiently. The need
for efficiency has led designers to use complex data structures to represent data in the
database. Since many database-system users are not computer trained, developers hide the
complexity from users through several levels of abstraction, to simplify users’ interactions
with the system.
The conceptual schema (sometimes called the logical schema) describes the stored data in terms
of the data model of the DBMS. In a relational DBMS, the conceptual schema describes all
relations that are stored in the database.
The physical schema specifies additional storage details. Essentially, the physical schema
summarizes how the relations described in the conceptual schema are actually stored on secondary
storage devices such as disks and tapes.
Decisions about the physical schema are based on an understanding of how the data is typically
accessed. The process of arriving at a good physical schema is called physical database design.
External schemas, which usually are also in terms of the data model of the DBMS, allow data
access to be customized (and authorized) at the level of individual users or groups of users. Any
given database has exactly one conceptual schema and one physical schema because it has just one
set of stored relations, but it may have several external schemas, each tailored to a particular group
of users. Each external schema consists of a collection of one or more views and relations from
the conceptual schema.
The external schema design is guided by end user requirements. For example, we might want to
allow students to find out the names of faculty members teaching courses, as well as course
enrollments. This can be done by defining the following view:
A user can treat a view just like a relation and ask questions about the records in the view. Even
though the records in the view are not stored explicitly, they are computed as needed.
7
Data Independence:
A very important advantage of using a DBMS is that it offers data independence. That is,
application programs are insulated from changes in the way the data is structured and stored. Data
independence is achieved through use of the three levels of data abstraction; in particular, the
conceptual schema and the external schema pro-vide distinct benefits in this area.
Relations in the external schema (view relations) are in principle generated on demand from the
relations corresponding to the conceptual schema. If the underlying data is reorganized, that is, the
conceptual schema is changed, the definition of a view relation can be modified so that the same
relation is computed as before. For example, suppose that the Faculty relation in our university
database is replaced by the following two relations:
Intuitively, some confidential information about faculty has been placed in a separate relation and
information about offices has been added. The Courseinfo view relation can be redefined in terms
of Faculty_public and Faculty_private, which together contain all the information in Faculty, so
that a user who queries Courseinfo will get the same answers as before. Thus users can be
shielded from changes in the logical structure of the data, or changes in the choice of relations
to be stored. This property is called logical data independence.
In turn, the conceptual schema insulates users from changes in the physical storage of the data.
This property is referred to as physical data independence. The conceptual schema hides details
such as how the data is actually laid out on disk, the file structure, and the choice of indexes. As
long as the conceptual schema remains the same, we can change these storage details without
altering applications.
naive users is a forms interface, where the user can fill in appropriate fields of the form.
Naive users may also simply read reports generated from the database.
• Application programmers: Application programmers are computer professionals who
write application programs. Application programmers can choose from many tools to
develop user interfaces. Rapid application development (RAD) tools are tools that enable
an application programmer to construct forms and reports without writing a program.
• Sophisticated users interact with the system without writing programs. Instead, they form
their requests in a database query language. They submit each such query to a query
processor, whose function is to break down DML statements into instructions that the
storage manager understands. Analysts who submit queries to explore data in the database
fall in this category.
• Specialized users are sophisticated users who write specialized database applications that
do not fit into the traditional data-processing framework.
Database Administrator:
One of the main reasons for using DBMSs is to have central control of both the data and the
programs that access those data. A person who has such central control over the system is called a
database administrator (DBA).
Database Administrator Functions/Roles:
The functions of a DBA include:
• Schema definition: The DBA creates the original database schema by executing a set of
data definition statements in the DDL, Storage structure and access-method definition.
• Schema and physical-organization modification: The DBA carries out changes to the
schema and physical organization to reflect the changing needs of the organization, or to
alter the physical organization to improve performance.
• Granting of authorization for data access: By granting different types of authorization,
the database administrator can regulate which parts of the database various users can
access. The authorization information is kept in a special system structure that the database
system consults whenever someone attempts to access the data in the system.
• Routine maintenance: Examples of the database administrator’s routine maintenance
activities are:
1) Periodically backing up the database, either onto tapes or onto remote servers, to
prevent loss of data in case of disasters such as flooding.
2) Ensuring that enough free disk space is available for normal operations, and
upgrading disk space as required.
3) Monitoring jobs running on the database and ensuring that performance is not
degraded by very expensive tasks submitted by some users.
What is ER Modeling?
A graphical technique for understanding and organizing the data independent of the actual database
implementation.
Entity:
Anything that has an independent existence and about which we collect data. It is also known as
entity type. In ER modeling, notation for entity is given below.
Entity instance:
Entity instance is a particular member of the entity type. Example for entity instance: A particular
employee
Weak entity
An entity which depends on other entity for its existence and doesn't have any key attribute of its
own is a weak entity.
Example for a weak entity: In a parent/child relationship, a parent is considered as a strong entity
and the child is a weak entity.
In ER modeling, notation for weak entity is given below.
Attributes:
10
Domain of Attributes:
The set of possible values that an attribute can take is called the domain of the attribute. For
example, the attribute day may take any value from the set {Monday, Tuesday ... Friday}. Hence
this set can be termed as the domain of the attribute day.
Key attribute:
The attribute (or combination of attributes) which is unique for every entity instance is called key
attribute.
E.g the employee_id of an employee, pan_card_number of a person etc. If the key attribute consists
of two or more attributes in combination, it is called a composite key.
In ER modeling, notation for key attribute is given below.
Simple attribute:
If an attribute cannot be divided into simpler components, it is a simple attribute. Example for
simple attribute: employee_id of an employee.
Composite attribute:
If an attribute can be split into components, it is called a composite attribute.
Example for composite attribute: Name of the employee which can be split into First_name,
Middle_name, and Last_name.
Single valued Attributes:
If an attribute can take only a single value for each entity instance, it is a single valued attribute.
Example for single valued attribute: age of a student. It can take only one value for a particular
student. Multi-valued Attributes:
If an attribute can take more than one value for each entity instance, it is a multi-valued attribute.
Example for multi valued attribute: telephone number of an employee, a particular employee may
have multiple telephone numbers.
In ER modeling, notation for multi-valued attribute is given below.
11
Stored Attribute:
An attribute which needs to be stored permanently is a stored attribute.
Example for stored attribute: name of a student
Derived Attribute:
An attribute which can be calculated or derived based on other attributes is a derived attribute.
Example for derived attribute: age of employee which can be calculated from date of birth and
current date. In ER modeling, notation for derived attribute is given below.
Relationships:
Associations between entities are called relationships
Example: An employee works for an organization. Here "works for" is a relation between the
entities employee and organization.
In ER modeling, notation for relationship is given below.
However, in ER Modeling, to connect a weak Entity with others, you should use a weak
relationship notation as given below.
Degree of a Relationship:
Degree of a relationship is the number of entity types involved. The n-ary relationship is the
general form for degree n. Special cases are unary, binary, and ternary, where the degree is 1, 2,
and 3, respectively.
Example for unary relationship: An employee is a manager of another employee.
Example for binary relationship: An employee works-for department.
Example for ternary relationship: customer purchase item from a shop keeper
Cardinality of a Relationship:
12
Relationship cardinalities specify how many of each entity type is allowed. Relationships can have
four possible connectivities as given below.
1. One to one (1:1) relationship
2. One to many (1:N) relationship
3. Many to one (M:1) relationship
4. Many to many (M:N) relationship
The minimum and maximum values of this connectivity is called the cardinality of the relationship
Example for Cardinality – One-to-One (1:1):
Employee is assigned with a parking space.
One employee is assigned with only one parking space and one parking space is assigned to only
one employee. Hence it is a 1:1 relationship and cardinality is One-To-One (1:1).
One organization can have many employees, but one employee works in only one organization.
Hence it is a 1:N relationship and cardinality is One-To-Many (1:N)
One employee works in only one organization but one organization can have many employees.
Hence it is a M:1 relationship and cardinality is Many-to-One (M :1)
One student can enroll for many courses and one course can be enrolled by many students. Hence
it is a M:N relationship and cardinality is Many-to-Many (M:N).
Relationship Participation:
1. Total: In total participation, every entity instance will be connected through the
relationship to another instance of the other participating entity types
2. Partial: Example for relationship participation.
Consider the relationship - Employee is head of the department.
Here all employees will not be the head of the department. Only one employee will be the
head of the department. In other words, only few instances of employee entity participate
in the above relationship. So, employee entity's participation is partial in the said
relationship. However, each department will be headed by some employee. So, department
entity's participation is total in the said relationship.
Disadvantages:
• Physical design derived from E-R Model may have some amount of ambiguities or
inconsistency.
• Sometime diagrams may lead to misinterpretations.
Entity Set: The collection of Similar Entities is called the Entity Set.
Attribute: The property of the Entity is called an Attribute. The entity is described using a set
of Attributes. It is represented with oval. Example: Employee Number, Name, Salary etc.,
Domain: The attribute with set of values is called Domain. For each attribute associated with
an entity set, we must identify a domain of possible values. For example, if the company rates
employees on a scale of 1 to 10 and stores ratings in a field called rating, the associated domain
consists of integers 1 through 10.
Key: A key is a minimal set of attributes whose values uniquely identify an entity in the set.
There could be more than one candidate key, and then we select any of them as the Primary
Key. So, for each entity set we choose a key. The key attribute is represented with under line.