Dbms Unit 1 Notes
Dbms Unit 1 Notes
UNIT-1
Introduction: An overview of database management system, database system Vs file system, Database system
concept and architecture, data model schema and instances, data independence and database language and
interfaces, data definitions language, DML, Overall Database Structure.
ER model concepts, notation for ER diagram, mapping constraints, keys, Concepts of Super Key, candidate key, primary
key, Generalization, aggregation, reduction of an ER diagrams to tables, extended ER model, relationship of higher
degree.
Database
A database is a collection of related data.
By data, we mean known facts that can be recorded and that have implicit meaning.
It is a collection of data in organised manner in a persistence media so that storing and retrieving data will be
easier.
size of the data is not fixed and it can vary
A database can be generated and maintained manually or by computer
Example -dictionary, telephone directory, library card catalogue etc.
Telecommunication: Keeping record of calls made monthly bills, storing information about bills storing
information about communication network etc.
Finance: Storing information about sales and purchase of shares, stocks, bonds etc.
Human Resource: Storing information about employees, salaries, benefits etc.
Internet: Storing user id password etc. in a mailing system.
Data Inconsistency: Data redundancy main lead to data inconsistency. For example, any change to record of one file
will not affect the record of other files in database oriented approach no redundancy of data so no change of data
inconsistency.
Difficult in Accessing Data: Conventional file processing environment do not allow needed data to be retrieved in a
convenient and efficient manner. For example, to find out the list of customers who live in a particular area we have
to search in multiple file separately.
Database system can handle any type of queries and produce the required information conveniently
Data Isolation: Data are scattered in various files and format of these data may be different, so retrieving the
appropriate data is difficult.
In database system data are stored in one place. So data will be accessed in a faster manner.
Data Integrity Problems: Data are stored in different places in different formats. So while combing to get the required
Data integrity(uniformity) problems are created. In database system as relationship is used there is no change of data
integrity problem
Atomicity Problems: In case of system failure, data should be stored to the consistent state that existed prior to the
failure. This is called atomicity it is difficult to ensure atomicity in file oriented approach. In database system, atomicity
can be ensured easily by writing codes for that.
Concurrent Access Anomalies: When multiple users access the same data simultaneously then there is a chance of
data inconsistency. In database system, the system guard against this possibility by some form of supervision
Security: No security in file oriented of approach. Any user can access all data.
In database oriented approach access restricted data available for certain group only
View of Data
To retrieve data efficiently Complex data structure are used to represent the data in the database. As the users don’t
understand the detail about the system. Developers hides the complexity from users through the following level of
abstraction.
Physical Level / Internal Level: It is the lowest level of abstraction which describe how the data are actually
stored (physical storage structure) and describe the data structures and access method to be used by the
database.
Logical Level / Conceptual Level: It is the next level of abstraction which specifies What data are stored in the
database and what relationship exists among those data. To decide what information to keep in the database.
DBA uses the logical level of abstraction.
View Level / External Level: This is the highest level of abstraction which describe part of the entire database
of interest to a particular user group. The view level of abstraction exists to simplify their interaction with the
system. The system may provide many viewers for the same database.
Physical Level / Internal Level: At this level employee record can be described as a block of consecutive storage
locations consisting of how much byte.
Logical Level / Conceptual Level: At this level each record is described by type definition and the in interrelationship
of these record types is also defined.
View Level / External Level: At this level the users see a set of application programs which hides detail of data types.
The user is allowed to access certain part of the database not the entire database.
Instances
Databases change over time as information is inserted and deleted.
The collection of information stored in the database at a particular moment is called an instance of the
database.
Instances of database are changed frequently with time.
Schemas
The overall design or description of the database is called the database schema.
Schemas are changed infrequently with time.
The concept of database schemas and instances can be understood by analogy to a program written in a programming
language.
Data Independence
Data independence can be defined as the capacity to change the schema at one level of a database system without
having to change the schema at the next higher level. We can define two types of data independence:
1. Logical data independence is the capacity to change the conceptual schema without having to change external
schemas or application programs.
Example: addition of records or data items, changing constraints, removing records or data items etc.
2. Physical data independence is the capacity to change the internal schema without having to change the conceptual
schema. The external schemas need not be changed as well. Changes to the internal schema may be needed because
some physical files were reorganized.
Example: by creating additional access structures—to improve the performance of retrieval or update.
Data Models
A collection of conceptual tools for describing data, data relationships, data semantics, and consistency
constraints.
A data model provides a way to describe the design of a database at the physical, logical, and view levels.
The data models can be classified into four different categories:
1. Relational Model.
The relational model uses a collection of tables to represent both data and the relationships among those data.
Each table has multiple columns, and each column has a unique name.
Tables are also known as relations.
Each table contains records of a particular type.
Each record type defines a fixed number of fields, or attributes (column).
The columns of the table correspond to the attributes of the record type.
The relational data model is the most widely used data model, and a vast majority of current database systems
are based on the relational model.
2. Entity-Relationship Model.
The entity-relationship (E-R) data model uses a collection of basic objects, called entities, and relationships
among these objects.
An entity is a “thing” or “object” in the real world that is distinguishable from other objects.
The entity-relationship model is widely used in database design.
3. Object-Based Data Model.
Object-oriented programming (especially in Java, C++, or C#) has become the dominant software-development
methodology.
This led to the development of an object-oriented data model that can be seen as extending the E-R model with
notions of encapsulation, methods (functions), and object identity.
The object-relational data model combines features of the object-oriented data model and relational data model.
.
4. Semistructured Data Model.
The Semistructured data model permits the specification of data where individual data items of the same
type may have different sets of attributes.
The Extensible Markup Language (XML) is widely used to represent Semistructured data.
Database Languages
A database system provides a data-definition language to specify the database schema and
A data-manipulation language to express database queries and updates.
In practice, the data-definition and data-manipulation languages are not two separate languages; instead they
simply form parts of a single database language, such as the widely used SQL language.
Data-Manipulation Language
A data-manipulation language (DML) is a language that enables users to access or manipulate data as
organized by the appropriate data model.
The types of access are:
o Retrieval of information stored in the database
o Insertion of new information into the database
o Deletion of information from the database
o Modification of information stored in the database
There are basically two types:
Procedural DMLs require a user to specify what data are needed and how to get those data.
Declarative DMLs (also referred to as nonprocedural DMLs) require a user to specify what data are needed
without specifying how to get those data. Declarative DMLs are usually easier to learn and use than are
procedural DMLs.
A query is a statement requesting the retrieval of information. The portion of a DML that involves information retrieval is
called a query language.
Data-Definition Language
We specify a database schema by a set of definitions expressed by a special language called a data-definition
language (DDL).
The DDL is also used to specify additional properties of the data.
We specify the storage structure and access methods used by the database system by a set of statements in a
special type of DDL called a data storage and definition language.
The data values stored in the database must satisfy certain consistency constraints.
Database Users
There are four different types of database-system users, differentiated by the way they expect to interact with the system.
1. Naive users are unsophisticated users who interact with the system by invoking one of the application programs that
have been written previously. For example, a clerk in the university who needs to add a new instructor to department.
2. Application programmers are computer professionals who write application programs. Application programmers
can choose from many tools to develop user interfaces. Rapid application development (RAD) tools are tools that
enable an application programmer to construct forms and reports with minimal programming effort.
3. Sophisticated users interact with the system without writing programs. Instead, they form their requests either using
a database query language or by using tools such as data analysis software. Analysts who submit queries to explore
data in the database fall in this category.
4. Specialized users are sophisticated users who write specialized database applications that do not fit into the
traditional data-processing framework. Among these applications are computer-aided design systems,
knowledgebase and expert systems, systems that store data with complex data types (for example, graphics data and
audio data).
DBMS Interfaces
These interfaces accept requests written in English or some other language and attempt to understand them. A
natural language interface usually has its own schema, which is similar to the database conceptual schema, as
well as a dictionary of important words.
The natural language interface refers to the words in its schema, as well as to the set of standard words in its
dictionary, that are used to interpret the request.
Most database systems contain privileged commands that can be used only by the DBA staff.
These include commands for creating accounts, setting system parameters, granting account authorization,
changing a schema, and reorganizing the storage structures of a database.
A database system is partitioned into modules that deal with each of the responsibilities of the overall system. The functional
components of a database system can be broadly divided into the storage manager and the query processor components.
Storage Manager
Query Processor
Storage Manager
The storage manager is the component of a database system that provides the interface between the low-level
data stored in the database and the application programs.
The storage manager is responsible for the interaction with the file manager.
The raw data are stored on the disk using the file system provided by the operating system.
The storage manager translates the various DML statements into low-level file-system commands.
The storage manager components include:
Authorization and Integrity Manager, which tests for the satisfaction of integrity constraints and checks the
authority of users to access data.
Transaction Manager, which ensures that the database remains in a consistent (correct) state despite system
failures, and that concurrent transaction executions proceed without conflicting.
File Manager, which manages the allocation of space on disk storage and the data structures used to represent
information stored on disk.
Buffer Manager, which is responsible for fetching data from disk storage into main memory, and deciding
what data to cache in main memory. The buffer manager is a critical part of the database system, since it
enables the database to handle data sizes that are much larger than the size of main memory.
The storage manager implements several data structures as part of the physical system implementation:
Data files, which store the database itself.
Data Dictionary, which stores metadata about the structure of the database, in particular the schema of the
database.
Indices, which can provide fast access to data items. Like the index in this textbook, a database index provides
pointers to those data items that hold a particular value.
The Query Processor
The query processor components include:
DDL Interpreter, which interprets DDL statements and records the definitions
in the data dictionary.
DML Compiler, which translates DML statements in a query language into an evaluation plan consisting of
low-level instructions that the query evaluation engine understands.
Query Optimization; that is, it picks the lowest cost evaluation plan from among the alternatives.
Query evaluation engine, which executes low-level instructions generated by the DML compiler.
Modeling
A database can be modeled as:
a collection of entities,
relationship among entities.
An entity is an object that exists and is distinguishable from other objects.
Example: specific person, company, event, plant
An entity is represented by a set of attributes. Attributes are descriptive properties possessed by each member of an
entity set.
Example: people have names and addresses
An entity set is a set of entities of the same type that share the same properties.
Example: set of all persons, companies, trees, holidays
Relationship Sets
Attributes
An entity is represented by a set of attributes, that is descriptive properties possessed by all members of an
entity set.
Example:
customer = (customer_id, customer_name,
customer_street, customer_city )
loan = (loan_number, amount )
Express the number of entities to which another entity can be associated via a relationship set.
Most useful in describing binary relationship sets.
For a binary relationship set the mapping cardinality must be one of the following types:
One to one
One to many
Many to one
Many to many
Keys
A super key of an entity set is a set of one or more attributes whose values uniquely determine each entity.
A candidate key of an entity set is a minimal super key
Customer_id is candidate key of customer
account_number is candidate key of account
Although several candidate keys may exist, one of the candidate keys is selected to be the primary key.
Keys for Relationship Sets
The combination of primary keys of the participating entity sets forms a super key of a relationship set.
(customer_id, account_number) is the super key of depositor
NOTE: this means a pair of entity sets can have at most one relationship in a particular relationship
set.
Example: if we wish to track all access_dates to each account by each customer, we cannot
assume a relationship for each access. We can use a multivalued attribute though
Must consider the mapping cardinality of the relationship set when deciding what are the candidate keys
Need to consider semantics of relationship set in selecting the primary key in case of more than one candidate
key
E-R Diagrams
Roles
Cardinality Constraints
We express cardinality constraints by drawing either a directed line (), signifying “one,” or an undirected line
(—), signifying “many,” between the relationship set and the entity set.
One-to-one relationship:
A customer is associated with at most one loan via the relationship borrower
A loan is associated with at most one customer via borrower
One-To-Many Relationship
In the one-to-many relationship a loan is associated with at most one customer via borrower, a customer is
associated with several (including 0) loans via borrower
Many-To-One Relationships
In a many-to-one relationship a loan is associated with several (including 0) customers via borrower, a
customer is associated with at most one loan via borrower
Many-To-Many Relationship
An entity set that does not have a primary key is referred to as a weak entity set.
The existence of a weak entity set depends on the existence of a identifying entity set
it must relate to the identifying entity set via a total, one-to-many relationship set from the identifying
to the weak entity set
Identifying relationship depicted using a double diamond
The discriminator (or partial key) of a weak entity set is the set of attributes that distinguishes among all the
entities of a weak entity set.
The primary key of a weak entity set is formed by the primary key of the strong entity set on which the weak
entity set is existence dependent, plus the weak entity set’s discriminator.
We depict a weak entity set by double rectangles.
We underline the discriminator of a weak entity set with a dashed line.
payment_number – discriminator of the payment entity set
Primary key for payment – (loan_number, payment_number)
Top-down design process; we designate subgroupings within an entity set that are distinctive from other
entities in the set.
These subgroupings become lower-level entity sets that have attributes or participate in relationships that do
not apply to the higher-level entity set.
Depicted by a triangle component labeled ISA (E.g. customer “is a” person).
Attribute inheritance – a lower-level entity set inherits all the attributes and relationship participation of the
higher-level entity set to which it is linked.
Generalization
A bottom-up design process – combine a number of entity sets that share the same features into a higher-
level entity set.
Specialization and generalization are simple inversions of each other; they are represented in an E-R diagram
in the same way.
The terms specialization and generalization are used interchangeably.
Can have multiple specializations of an entity set based on different features.
E.g. permanent_employee vs. temporary_employee, in addition to officer vs. secretary vs. teller
Each particular employee would be
Aggregation
Consider the ternary relationship works_on, which we saw earlier
Suppose we want to record managers for tasks performed by an
employee at a branch
Relationship sets works_on and manages represent overlapping information
Every manages relationship corresponds to a works_on relationship
However, some works_on relationships may not correspond to any manages relationships
n So we can’t discard the works_on relationship
o Since ER diagram gives us the good knowledge about the requirement and the mapping of the entities in it, we
can easily convert them as tables and columns. i.e.; using ER diagrams one can easily create relational data
model, which nothing but the logical view of the database.
o There are various steps involved in converting it into tables and columns. Consider the ER diagram below and
will see how it is converted into tables, columns and mappings.
The basic rule for converting the ER diagrams into tables is
Convert all the Entities in the diagram to tables.
All the entities represented in the rectangular box in the ER diagram become independent tables in the database. In the
below diagram, STUDENT, COURSE, LECTURER and SUBJECTS forms individual tables.
All single valued attributes of an entity is converted to a column of the table
All the attributes, whose value at any instance of time is unique, are considered as columns of that table. In the
STUDENT Entity, STUDENT_ID, STUDENT_NAME form the columns of STUDENT table. Similarly,
LECTURER_ID, LECTURER_NAME form the columns of LECTURER table. And so on.
Key attribute in the ER diagram becomes the Primary key of the table.
In diagram above, STUDENT_ID, LECTURER_ID, COURSE_ID and SUB_ID are the key attributes of the entities.
Hence we consider them as the primary keys of respective table.
Declare the foreign key column, if applicable.
In the diagram, attribute COURSE_ID in the STUDENT entity is from COURSE entity. Hence add COURSE_ID in the
STUDENT table and assign it foreign key constraint. COURSE_ID and SUBJECT_ID in LECTURER table forms the
foreign key column. Hence by declaring the foreign key constraints, mapping between the tables are established.
Any multi-valued attributes are converted into new table.
A hobby in the Student table is a multivalued attribute. Any student can have any number of hobbies. So we cannot
represent multiple values in a single column of STUDENT table. We need to store it separately, so that we can store
any number of hobbies, adding/ removing / deleting hobbies should not create any redundancy or anomalies in the
system. Hence we create a separate table STUD_HOBBY with STUDENT_ID and HOBBY as its columns. We create
a composite key using both the columns.
Any composite attributes are merged into same table as different columns.
In the diagram above, Student Address is a composite attribute. It has Door#, Street, City, State and Pin. These attributes
are merged into STUDENT table as individual columns.
One can ignore derived attribute, since it can be calculated at any time.
In the STUDENT table, Age can be derived at any point of time by calculating the difference between DateOfBirth and
current date. Hence we need not create a column for this attribute. It reduces the duplicity in the database.
These are the very basic rules of converting ER diagram into tables and columns, and assigning the mapping between
the tables. Table structure at this would be as below:
choice.
Representing 1:N relationship
Consider SUBJECT and LECTURER relation, where each Lecturer teaches multiple subjects. This is a 1: N relation. In
this case, primary key of LECTURER table is added to the SUBJECT table. i.e.; the primary key at 1 cardinality entity
is added as foreign key to N cardinality entity