0% found this document useful (0 votes)
83 views

Unit - 1 Notes

The document provides an overview of database management systems with 4 main sections: 1. It describes the purpose of a database system as efficiently storing, managing and retrieving structured data for users and applications. 2. It explains that a database schema defines the structure and organization of data, while a database instance refers to the actual data stored at a point in time. 3. It outlines the physical, logical and view levels that provide abstraction of how data is stored and accessed. 4. It introduces the main database languages used to define the database structure with DDL, interact with data using DML, and retrieve data with DQL.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
83 views

Unit - 1 Notes

The document provides an overview of database management systems with 4 main sections: 1. It describes the purpose of a database system as efficiently storing, managing and retrieving structured data for users and applications. 2. It explains that a database schema defines the structure and organization of data, while a database instance refers to the actual data stored at a point in time. 3. It outlines the physical, logical and view levels that provide abstraction of how data is stored and accessed. 4. It introduces the main database languages used to define the database structure with DDL, interact with data using DML, and retrieve data with DQL.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

Database Management Systems

Unit – I Introduction To DBMS


Purpose of Database System Database Schema and Instances- Views of data Database
Languages - Database System Architecture Database users and Administrator
Distributed Databases-DDB System Architecture-Database models-Entity Relationship
model E-R Diagrams - Introduction to relational databases Structure of relational
databases- Relational model-Basics- From E/R diagrams to Database Design-
Hierarchical model-Tree-structure diagrams- data retrieval-update facility-virtual
recordsmapping of hierarchies to files-Network Model- Data-structure diagrams-
DBTG modeldata retrieval-Find-Get-Set processing-update facility-Mapping network
to files. Case study: Conference Management system.

1. Purpose of Database System


The purpose of a database system is to efficiently and securely store, manage, and retrieve data
in an organized and structured manner. It serves as a central repository for data, allowing users
and applications to access, manipulate, and analyze information as needed. Database systems
are essential for various industries and applications, including business, science, government,
education, healthcare, and more.

Key purposes of a database system include:

Data Storage: A database system provides a structured and organized method for storing vast
amounts of data, ranging from simple text and numbers to multimedia content like images,
audio, and video.

Data Retrieval: Users can easily search and retrieve specific data from the database, using
queries and filters. This makes it efficient to extract information needed for various tasks and
analyses.

Data Integrity and Security: Database systems employ mechanisms to ensure data integrity,
preventing data duplication and maintaining consistency. Additionally, access controls and
authentication mechanisms are implemented to secure sensitive data from unauthorized access.

Data Manipulation: Database systems support various operations like inserting new data,
updating existing records, and deleting unwanted information, enabling users to manage the
data effectively.

Data Sharing and Collaboration: Multiple users can access the same database
simultaneously, facilitating collaboration and information sharing within an organization.

Data Concurrency Control: Database systems ensure that multiple users can access and
manipulate data concurrently without causing inconsistencies or conflicts.

Data Recovery: Database systems often include backup and recovery mechanisms to protect
against data loss due to system failures, human errors, or disasters.
Data Analysis and Reporting: By organizing data in a structured manner, database systems
enable the use of analytical tools and techniques to derive insights, generate reports, and make
informed decisions.

Scalability: Database systems can handle increasing amounts of data and user demands by
scaling up or out, depending on the requirements.

Reducing Data Redundancy: With a centralized database, redundant data storage is


minimized, leading to efficient data management and reduced storage costs.

Overall, database systems play a critical role in managing and leveraging data effectively,
making them an integral part of modern information systems and applications.

2. Database Schema and Instances

In the context of a database system, the terms "database schema" and "database instances" refer
to fundamental concepts related to the organization and content of data.

Database Schema:
A database schema defines the logical and structural layout of a database. It represents the
overall blueprint of how the data is organized, the relationships between different data
elements, and the constraints imposed on the data. A schema is essential for ensuring data
integrity, consistency, and efficient data retrieval.
Key components of a database schema include:

Tables: Tables are the primary structures within a database and represent entities, such as
customers, products, or orders. Each table consists of rows and columns where data is stored.

Columns: Columns, also known as attributes, represent individual data fields within a table.
Each column has a specific data type, defining the kind of data it can store (e.g., text, numbers,
dates).

Primary Key: A primary key is a unique identifier for each row in a table. It ensures that each
record can be uniquely identified and helps establish relationships between tables.

Foreign Key: A foreign key is a column in one table that refers to the primary key of another
table. It establishes relationships between tables, enabling the creation of links between related
data.

Constraints: Constraints define rules and conditions that the data in the database must follow,
ensuring data consistency and accuracy. Examples include unique constraints, check
constraints, and not-null constraints.

Database Instances:
A database instance, also known as a database state, refers to the actual data stored in a database
at a particular point in time. It represents the current snapshot of the data contained within the
database schema. When data is inserted, updated, or deleted in the database, the instance
changes to reflect the modifications.
For example, consider a database schema that includes two tables: "Customers" and "Orders."
The schema defines the structure of these tables, the relationships between them (e.g., the
"Orders" table may have a foreign key referring to the "Customers" table), and any constraints
on the data.

A database instance would be the specific set of data currently present in the "Customers" and
"Orders" tables, including all the rows and their values. As new customers are added or orders
are placed, the database instance is updated to reflect these changes.

3. Views of data

A database system is a collection of interrelated data and a set of programs that allows users to
access and modify these data. A major purpose of a database system is to provide users with
an abstract view of the data. That is, the system hides certain details of how data are stored and
maintained.

In the traditional database management system architecture, the data abstraction is organized
into three levels:

Physical Level: This level is concerned with the physical storage details of data on the storage
medium. It deals with how the data is stored in terms of bytes, blocks, pages, and how it is
physically organized on disk or other storage devices. At this level, the database management
system interacts directly with the storage system, and it is aware of hardware-specific details.

Logical Level: The logical level sits above the physical level and is concerned with the logical
representation of data, independent of the physical storage details. At this level, the data is
organized into tables, views, and relationships, and the database management system provides
an abstract view of the data. Users and application programs interact with the database at the
logical level using high-level query languages like SQL.

View Level: The view level is the highest level of data abstraction. It involves creating
customized views of the data for specific users or applications. Views allow users to see a
subset of the data or present the data in a format that is most relevant to their needs. The view
level abstracts away the underlying complexity of the data model and provides a simplified
view tailored to the specific requirements of different users.

The concept of data abstraction and the three levels are crucial for managing complex databases
efficiently. They provide a way to hide the implementation details from users and application
developers, enabling them to work with data at higher levels of abstraction without needing to
understand the intricacies of data storage and management.

By separating the physical, logical, and view levels, database management systems can achieve
better data independence, making it easier to modify the database structure or storage without
affecting the applications that use the data. This modular approach simplifies database
development, maintenance, and scalability. Additionally, it enhances security by controlling
access to the data at different levels based on user privileges.
View Level

View 1 View 2 View1


….

Logical Level

Physical Level

4. Database Languages

Database languages are programming languages specifically designed for interacting with
databases. These languages provide a standardized and efficient way for users, applications,
and database administrators to perform various operations on the data stored in a database.
There are primarily three types of database languages:

Data Definition Language (DDL):


The Data Definition Language is used to define the structure and organization of the database.
It allows users and administrators to create, modify, and manage the database schema,
including tables, views, indexes, and other database objects. Common DDL commands
include:
CREATE: Used to create database objects like tables, views, indexes, and procedures.
ALTER: Used to modify the structure of existing database objects.
DROP: Used to delete database objects from the database.
TRUNCATE: Used to remove all data from a table while keeping the table structure intact.
Examples of DDL statements in SQL (Structured Query Language):

CREATE TABLE Employees (


ID INT PRIMARY KEY,
FirstName VARCHAR(50),
LastName VARCHAR(50),
Age INT
);

ALTER TABLE Employees ADD COLUMN Department VARCHAR(100);

DROP TABLE Employees;

Data Manipulation Language (DML):


The Data Manipulation Language is used to interact with the data stored in the database. It
allows users and applications to insert, retrieve, update, and delete data from the database
tables. Common DML commands include:
SELECT: Used to retrieve data from one or more tables.
INSERT: Used to add new records to a table.
UPDATE: Used to modify existing records in a table.
DELETE: Used to remove records from a table.
Examples of DML statements in SQL:
SELECT FirstName, LastName FROM Employees WHERE Age > 30;

INSERT INTO Employees (ID, FirstName, LastName, Age) VALUES (1, 'John', 'Doe',
35);

UPDATE Employees SET Age = 40 WHERE ID = 1;

DELETE FROM Employees WHERE Age > 65;

Data Control Language (DCL):


The Data Control Language is used to manage the access and permissions of users and roles
within the database. It allows database administrators to grant or revoke privileges to perform
specific actions on the database objects. Common DCL commands include:
GRANT: Used to give specific privileges to users or roles.
REVOKE: Used to take away previously granted privileges.
Example of DCL statements in SQL:
GRANT SELECT, INSERT ON Employees TO MarketingTeam;

REVOKE UPDATE ON Employees FROM TemporaryUser;

SQL (Structured Query Language) is the most widely used and well-known database
language, which incorporates all three types of database languages: DDL, DML, and DCL.
However, other database systems may have their specific languages or extensions for
interacting with the data, but the principles of DDL, DML, and DCL generally apply across
various database systems.

5. Database System Architecture

Database system architecture refers to the overall design and structure of a database
management system (DBMS). It encompasses the components, modules, and interactions that
make up the database system, facilitating data storage, retrieval, manipulation, and
management. A well-designed architecture ensures the efficient and reliable functioning of the
database system, optimizing performance, scalability, and security. The architecture of a
database system typically consists of the following key components:

Data Storage:
This component involves the physical storage of data on disk or other storage media. The data
is organized into data files or tables, each containing records with attributes or fields. Various
data structures and techniques are used to optimize data storage and access, such as indexing,
partitioning, and clustering.

Database Management System (DBMS):


The DBMS acts as the core of the database system, responsible for managing and controlling
data access, storage, and retrieval. It provides a set of software programs and services that allow
users and applications to interact with the database. The DBMS ensures data integrity, security,
and concurrency control, handling multiple users and applications concurrently.

Query Processor:
The query processor is a vital part of the database system architecture responsible for parsing,
optimizing, and executing user queries. When a user submits a query (e.g., SQL statement), the
query processor analyzes the query, determines the most efficient way to retrieve the data, and
optimizes the query execution plan. This involves selecting the appropriate indexes and access
paths to minimize the query's response time.

Transaction Manager:
The transaction manager handles transactions, which are sequences of one or more database
operations that are treated as a single unit of work. The transaction manager ensures the
atomicity, consistency, isolation, and durability (ACID properties) of transactions, preventing
data inconsistencies and ensuring data integrity.

Buffer Manager:
The buffer manager is responsible for managing the buffer pool, which is an area of memory
used to cache frequently accessed data pages from the disk. The buffer manager reduces disk
I/O by keeping commonly used data in memory, improving overall database performance.

Concurrency Control Manager:


Concurrency control is essential to manage multiple users accessing the database
simultaneously. The concurrency control manager ensures that transactions execute in a way
that maintains data consistency, preventing issues like lost updates and dirty reads.

Recovery Manager:
The recovery manager is responsible for ensuring the database's recoverability in the event of
a system failure or crash. It manages the logging and undo/redo mechanisms to support data
recovery to a consistent state after failure.

Security and Authorization:


Database systems implement security measures to control access to data and ensure that only
authorized users have permission to perform specific operations on the database objects. The
security and authorization component manages user authentication and access privileges.

Front-End and Application Interface:


The front-end component provides the user interface and tools for interacting with the database
system. Applications and users communicate with the database system through this interface,
which can be a command-line interface, graphical user interface (GUI), or web-based interface.

Overall, the architecture of a database system is designed to provide a robust and reliable
platform for storing, managing, and accessing data efficiently, catering to the needs of various
users and applications while maintaining data integrity and security.
6. Database Users and Administrator

In a database system, there are different types of users who interact with the database, each
having specific roles and responsibilities. The two primary categories of users are regular
database users and the database administrator.

Regular Database Users:


Regular database users are individuals or applications that use the database to perform specific
tasks or operations. They interact with the database to retrieve, insert, update, and delete data
based on their access privileges and permissions granted by the database administrator. Regular
users can be further categorized into end-users and application users:
End-Users: End-users are individuals who directly interact with the database through front-
end applications or user interfaces. They use the database to perform tasks related to their job
roles, such as data entry, generating reports, querying data, and analyzing information.

Application Users: Application users are users who interact with the database indirectly
through applications or software systems. These applications are developed to perform specific
tasks, and they use database queries to access and manipulate data on behalf of the application
users.

Regular database users typically have limited access to the database, only able to perform
specific operations based on the permissions granted by the database administrator. They are
not involved in the administration or management of the database itself.

Database Administrator (DBA):


The database administrator is a crucial role responsible for managing, maintaining, and
securing the entire database system. Their primary focus is on ensuring the overall health,
performance, and security of the database. The DBA's responsibilities include:
Database Installation and Configuration: Installing the database software and configuring it
based on the system requirements and organizational needs.

Database Security: Setting up and managing user accounts, access permissions, and
authentication to ensure data security and prevent unauthorized access.

Data Backup and Recovery: Implementing regular data backups to protect against data loss
due to hardware failures, software errors, or disasters. The DBA is also responsible for
developing strategies for data recovery.

Performance Tuning: Monitoring database performance, identifying bottlenecks, and


optimizing the database configuration and queries to improve overall performance.

Schema Management: Designing, creating, and modifying the database schema to ensure data
integrity, efficiency, and ease of use.

User Support: Assisting end-users and application developers in using the database
effectively, troubleshooting issues, and providing support as needed.

Database Upgrades and Patches: Managing database software upgrades and applying
patches to address security vulnerabilities and bug fixes.

Database Monitoring and Maintenance: Regularly monitoring the database system to


identify and resolve issues proactively, ensuring the system operates smoothly.

The database administrator has access to higher privileges and capabilities to manage the entire
database system, making sure it meets the organization's needs and adheres to industry best
practices. Their role is critical in maintaining data integrity, security, and availability while
optimizing database performance for smooth and efficient operations.

7. Distributed Database

A distributed database is a type of database system that stores data across multiple physical
locations or nodes, connected through a computer network. In a distributed database, data is
distributed and replicated across different servers or sites, allowing for more efficient data
management, improved scalability, fault tolerance, and enhanced performance. Distributed
databases are commonly used in large-scale systems where a centralized database would
become a bottleneck or a single point of failure.

Key characteristics and concepts of distributed databases include:

Distribution Transparency: A well-designed distributed database system provides


distribution transparency, meaning that users and applications are shielded from the
complexities of data distribution. They can interact with the database as if it were a centralized
system, without needing to know where the data is physically stored or how it is distributed.

Data Replication: Data may be replicated across multiple nodes to improve data availability
and fault tolerance. Replication ensures that copies of data exist in different locations, reducing
the risk of data loss in case of server failures or network outages.

Data Fragmentation: Data fragmentation involves dividing data into smaller parts or
fragments and storing each fragment on different nodes. This approach can improve query
performance by allowing parallel processing and reducing data access latency.

Transaction Management: Distributed databases need to handle distributed transactions,


which involve multiple operations that span multiple nodes. Ensuring the consistency and
atomicity of distributed transactions is a critical aspect of distributed database design.

Distributed Query Processing: Query processing in distributed databases involves optimizing


and executing queries that may require data retrieval from multiple nodes. Efficient query
processing is essential to minimize data transfer over the network and reduce response times.

Data Consistency and Synchronization: Maintaining data consistency across distributed


nodes can be challenging. Distributed databases use various techniques like distributed locking,
two-phase commit, and timestamp-based protocols to ensure data consistency and
synchronization.

Network Communication: Communication between nodes is fundamental to the functioning


of a distributed database. An efficient and reliable network infrastructure is crucial to support
data transfer, replication, and coordination between nodes.

Load Balancing: Distributed databases aim to distribute data and processing load evenly
across nodes to avoid overloading any specific server. Load balancing helps ensure optimal
performance and resource utilization.

Advantages of Distributed Databases:

Improved performance and scalability: Distributing data across multiple nodes allows for
parallel processing and better performance for large-scale applications with high data volumes
and user concurrency.
Increased fault tolerance: Data replication and distribution reduce the risk of data loss and
system failures. If one node fails, the data is still available from other replicas.
Enhanced availability: Distributed databases can remain operational even if some nodes are
offline or experiencing issues, ensuring continuous access to data.
Geographic distribution: Distributed databases enable data to be stored close to the end-users,
reducing data access latency and improving user experience.
Challenges of Distributed Databases:

Complexity: Designing, managing, and maintaining a distributed database system can be more
complex than a centralized database.
Data consistency: Ensuring data consistency across distributed nodes while supporting
concurrent access can be challenging and may require careful planning and coordination.
Network latency: Data transfer over the network can introduce latency, impacting response
times and overall performance.
Data security: Securing data across multiple nodes and network communication requires
robust security measures and encryption protocols.
Despite the challenges, distributed databases offer significant advantages for large-scale and
geographically dispersed systems, making them a vital component of modern data management
strategies.

8. Distributed Database System Architecture


A distributed database system architecture is a design that allows data to be stored and
managed across multiple physical locations or nodes, while presenting a single, unified
database to users and applications. This architecture is commonly used in environments
where data needs to be geographically distributed, there is a need for high availability,
scalability, and fault tolerance, or to reduce the load on a single centralized database.

The key components of a distributed database system architecture include:

Distributed Data: The database is divided into smaller fragments or partitions, and each
fragment is distributed across multiple nodes or servers. These nodes can be located in
different geographic locations or data centers.

Distributed Transaction Management: To maintain data consistency and integrity, a


distributed database system needs to manage distributed transactions. Distributed
transaction management ensures that multiple operations across different nodes are
coordinated and either all succeed or all fail, adhering to the principles of ACID (Atomicity,
Consistency, Isolation, Durability).

Data Replication: To improve data availability and fault tolerance, some distributed
databases use data replication. Data is duplicated across multiple nodes so that if one node
fails, the data can still be accessed from other nodes.

Distributed Query Processing: Distributed database systems need to handle queries that
span multiple nodes and retrieve data from various locations. Query optimizers determine
the most efficient way to execute queries, considering data distribution, network latency,
and other factors.

Data Distribution Transparency: The distributed database system should present a


transparent view of data to users and applications. Users should be able to access and
manipulate data as if it were stored in a single, centralized database, without needing to
know the physical distribution of data.

Data Consistency and Replication Management: Maintaining data consistency across


distributed nodes can be challenging. Distributed database systems use various protocols
and algorithms, such as two-phase commit, quorum-based systems, and conflict resolution
techniques to ensure consistency.

Load Balancing: To evenly distribute the workload among nodes and avoid bottlenecks,
load balancing mechanisms are employed. Load balancing helps optimize resource
utilization and improves the overall performance of the distributed database system.

Security and Access Control: Distributed database systems must implement robust
security measures to protect data privacy and prevent unauthorized access. Access control
mechanisms are used to define user permissions and restrict data access based on roles and
privileges.

Overall, a well-designed distributed database system architecture provides numerous


advantages, including improved data availability, fault tolerance, scalability, and better
performance. However, it also introduces complexities and challenges related to data
consistency, distributed transaction management, and network latency, which must be
carefully addressed to ensure the system's reliability and effectiveness.

Distributed Database Architecture

A distributed database is a set of logically interrelated database that are stored on computers
at several geographically different sites and are linked by means of a computer network.
These interrelated database work together to perform certain specific tasks.
Distributed computer system work by splitting a large task into a number of smaller ones that
can then be solved in a coordinated fashion. Each processing element can be managed
independently and can be developed its own applications.

Fig: Distributed database system


Advantages:
· This database can be easily expanded as data is already spread across different
physical locations.
· The distributed database can easily be accessed from different networks.
· This database is more secure in comparison to centralized database.
Disadvantages:
· This database is very costly and it is difficult to maintain because of its complexity.
· In this database, it is difficult to provide a uniform view to user since it is spread
across different physical locations.

9. Database Model
Database models are conceptual frameworks that define the structure, organization, and
relationships of data within a database. These models provide a way to represent and
understand how data is stored and accessed in a database system. Different types of
database models have been developed over time, each with its own characteristics and
use cases. Some of the common database models are:

Hierarchical Model:
In the hierarchical model, data is organized in a tree-like structure, where each record
(or data element) has a parent-child relationship with other records. The top-level record
represents the root, and child records can have only one parent. This model was
prevalent in early database systems and is now less commonly used.

Network Model:
The network model extends the hierarchical model by allowing records to have multiple
parent records, forming a more complex interconnected network of relationships. This
model was also widely used in the early days of databases but has been largely replaced
by other models like the relational model.

Relational Model:
The relational model is the most widely used database model today. It organizes data
into tables, where each table represents an entity (e.g., customers, products) and each
row in the table represents a specific record or instance of that entity. The relationships
between entities are established through foreign keys, which reference the primary keys
of related tables.

Entity-Relationship Model (ER Model):


The ER model is a high-level data model used to represent the conceptual view of a
database. It represents entities, attributes, and relationships between entities. This
model is often used during the initial design phase to define the structure of a database
before implementing it using a specific database management system.

Object-Oriented Model:
The object-oriented model represents data as objects, similar to the concepts in object-
oriented programming. Each object encapsulates data and behaviors (methods) related
to that data. This model is often used in object-oriented databases, which combine the
benefits of object-oriented programming with data storage and retrieval.

Object-Relational Model:
The object-relational model combines elements of both the relational and object-
oriented models. It extends the relational model to support complex data types, user-
defined data types, and object-oriented features like inheritance and polymorphism.

NoSQL Models:
NoSQL (Not Only SQL) databases do not adhere to the traditional relational model and
instead use various non-relational data models to store and manage data. Common
NoSQL models include:

Document Model: Data is stored as documents (e.g., JSON, XML) that can have nested
structures and different attributes.
Key-Value Model: Data is stored as key-value pairs, enabling fast and simple data
access.
Columnar Model: Data is stored in columns rather than rows, allowing for efficient
data retrieval and analytics.
Each database model has its strengths and weaknesses, and the choice of model depends
on the specific requirements of the application and the nature of the data being stored
and accessed.
10. Entity Relationship Model

The Entity-Relationship (ER) model is a high-level data model used to represent the conceptual
view of a database. It provides a graphical and intuitive representation of the structure of a
database, focusing on the entities (objects or things) within the system and the relationships
between them. The ER model is widely used during the initial phases of database design to
capture the data requirements and to create a blueprint for the database schema before
implementation.

Key components of the Entity-Relationship model include:

Entity:
An entity is a real-world object or concept with a distinct identity, represented as a rectangle in
the ER diagram. Each entity is uniquely identifiable and can have attributes that describe its
properties.

Attributes:
Attributes are the characteristics or properties of an entity, represented as ovals or ellipses in
the ER diagram. Attributes define the specific details or information associated with the entity.
For example, in a "Customer" entity, attributes could include "CustomerID," "Name," "Email,"
and "Address."

Relationships:
Relationships depict the associations between two or more entities, represented as lines
connecting the related entities. Relationships describe how entities are related to each other and
can have cardinality and optionality constraints.

Cardinality: Cardinality represents the number of instances of one entity that can be related
to the number of instances of another entity through the relationship. Common cardinalities
include one-to-one (1:1), one-to-many (1:N), and many-to-many (N:M).
Optionality: Optionality refers to whether a relationship is mandatory or optional for an entity.
It is denoted as either total (mandatory) or partial (optional) participation.

Primary Key:
A primary key is an attribute or a combination of attributes that uniquely identifies each
instance (row) of an entity. It is represented by an underline in the ER diagram.

Weak Entity:
A weak entity is an entity that cannot be uniquely identified by its attributes alone. It relies on
a relationship with a strong entity (owner entity) for its identification. Weak entities are
represented with a double rectangle in the ER diagram.

Identifying Relationship:
An identifying relationship is a type of relationship where a weak entity's existence is
dependent on its association with a strong entity. The identifying relationship is represented
with a solid line connecting the weak entity to the strong entity.

The ER model is typically represented graphically using ER diagrams, which use various
symbols and notations to represent entities, attributes, and relationships. ER diagrams are
helpful for visualizing the database schema and understanding the data requirements and
relationships between entities in a clear and concise manner.

Overall, the Entity-Relationship model provides a foundation for database design and helps
database designers, developers, and stakeholders to conceptualize and communicate the
structure and organization of data in a database system.

E-R Diagram

An Entity-Relationship (E-R) diagram is a graphical representation of the Entity-Relationship


model. It illustrates the structure of a database system by depicting entities, attributes, and
relationships between entities. E-R diagrams are used to visualize and communicate the
database schema during the database design phase. They provide an easy-to-understand and
concise representation of the data requirements and the relationships between different data
components.

Symbols used in E-R diagrams:

Entity:
Entities are represented as rectangles in the diagram. Each entity has a name that describes the
real-world object it represents.
Example:
Attribute:
Attributes are represented as ovals or ellipses connected to the entity they belong to. They
describe the properties or characteristics of the entity.
Example:

In this example, "EmployeeID" is an attribute of the "Employee" entity, and (PK) denotes that
it is the primary key attribute.

Relationship:
Relationships are represented as diamond shapes connecting two or more entities. They
describe the associations between entities.
Example:

In this example, the relationship "Works_On" connects the "Employee" and "Project" entities,
indicating that employees work on projects.

Cardinality Notation:
Cardinality notation is used to represent the number of instances of one entity that can be
associated with the number of instances of another entity through a relationship. Common
cardinalities include one-to-one (1:1), one-to-many (1:N), and many-to-many (N:M).
Example:

In this example, the cardinality "(1:N)" indicates that one employee can work on multiple
projects, but each project is associated with one employee.
E-R diagrams provide a visual representation of the database structure, making it easier for
designers, developers, and stakeholders to understand the data model and relationships within
the database system. They serve as a valuable tool in the database design process.

Hierarchical Model :

This is one of the oldest models in a data model which was developed by IBM, in the 1950s.
In a hierarchical model, data are viewed as a collection of tables, or we can say segments that
form a hierarchical relation. In this, the data is organized into a tree-like structure where each
record consists of one parent record and many children. Even if the segments are connected as
a chain-like structure by logical associations, then the instant structure can be a fan structure
with multiple branches. We call the illogical associations as directional associations.
In the hierarchical model, segments pointed to by the logical association are called the child
segment and the other segment is called the parent segment. If there is a segment without a
parent is then that will be called the root and the segment which has no children are called
the leaves. The main disadvantage of the hierarchical model is that it can have one-to-one and
one-to-many relationships between the nodes.
Applications of hierarchical model :
 Hierarchical models are generally used as semantic models in practice as many real-world
occurrences of events are hierarchical in nature like biological structures, political, or social
structures.
 Hierarchical models are also commonly used as physical models because of the inherent
hierarchical structure of the disk storage system like tracks, cylinders, etc. There are various
examples such as Information Management System (IMS) by IBM, NOMAD by NCSS,
etc.
Example 1: Consider the below Student database system hierarchical model.

In the above-given figure, we have few students and few course-enroll and a course can be
assigned to a single student only, but a student can enroll in any number of courses and with
this the relationship becomes one-to-many. We can represent the given hierarchical model like
the below relational tables:
FACULTY Table

Name Dept Course-taught

Mr. K. Kishore Kumar CSE CA

Dr. Sureshkumar CSE SE

Dr. N. Rajkumar CSE DBMS

STUDENT Table

Name Course-enroll Grade

Akash Reddy CA B

Dhivya SE A

Mani Reddy SE B

Ravi Teja DBMS A

Example 2: Consider the below cricket database system hierarchical model scheme.

Here, in this example, for each player, there are some set of positions (P_POSITION) he plays,
a set of places (P_PLACE), and also a set of birthdates (P_BDATE) of the players. In the above
figure, each node represents a logical record type and is displayed by a list of its fields. The
child node represents a set of records that are connected to each record of the parent type, which
is due to a many-to-many relationship is from child to parent. In the above, figure, the root
node PLAYER states that for every player there will be a set of positions, a set of places (only
one), and a set of birthdates (which is only one).
Advantages of the hierarchical model :
 As the database is based on this architecture the relationships between various layers are
logically simple so, it has a very simple hierarchical database structure.
 It has data sharing as all data are held in a common database data and therefore sharing of
data becomes practical.
 It offers data security and this model was the first database model that offered data security.
 There’s also data integrity as it is based on the parent-child relationship and also there’s
always a link between the parents and the child segments.

Disadvantages of the hierarchical model :


 Even though this model is conceptually simple and easy to design at the same time it is
quite complex to implement.
 This model also lacks flexibility as the changes in the new tables or segments often yield
very complex system management tasks. Here, a deletion of one segment can lead to the
involuntary deletion of all segments under it.
 It has no standards as the implementation of this model does not provide any specific
standard.
 It is also limited as many of the common relationships do not conform to the 1 to N format
as required by the hierarchical model.

Network Model :

This model was formalized by the Database Task group in the 1960s. This model is the
generalization of the hierarchical model. This model can consist of multiple parent segments
and these segments are grouped as levels but there exists a logical association between the
segments belonging to any level. Mostly, there exists a many-to-many logical association
between any of the two segments. We called graphs the logical associations between the
segments. Therefore, this model replaces the hierarchical tree with a graph-like structure, and
with that, there can more general connections among different nodes. It can have M: N relations
i.e, many-to-many which allows a record to have more than one parent segment.
Here, a relationship is called a set, and each set is made up of at least 2 types of record which
are given below:
 An owner record that is the same as of parent in the hierarchical model.
 A member record that is the same as of child in the hierarchical model.
Structure of a Network Model :

A Network data model


In the above figure, member TWO has only one owner ‘ONE’ whereas member FIVE has two
owners i.e, TWO and THREE. Here, each link between the two record types represents 1 : M
relationship between them. This model consists of both lateral and top-down connections
between the nodes. Therefore, it allows 1: 1, 1 : M, M : N relationships among the given entities
which helps in avoiding data redundancy problems as it supports multiple paths to the same
record. There are various examples such as TOTAL by Cincom Systems Inc., EDMS by Xerox
Corp., etc.
Example : Network model for a Finance Department.
Below we have designed the network model for a Finance Department :

Network model of Finance Department.

So, In a network model, a one-to-many (1: N) relationship has a link between two record
types. Now, in the above figure, SALES-MAN, CUSTOMER, PRODUCT, INVOICE,
PAYMENT, INVOICE-LINE are the types of records for the sales of a company. Now, as you
can see in the given figure, INVOICE-LINE is owned by PRODUCT & INVOICE. INVOICE
has also two owners SALES-MAN & CUSTOMER.
Let’s see another example, in which we have two segments, Faculty and Student. Say that
student John takes courses both in CS and EE departments. Now, find how many instances will
be there?
For the above example, a students instance can have at least 2 parent instances therefore, there
exist relations between the instances of students and faculty segment. The model can be very
complex as if we use other segments say Courses and logical associations like Student-Enroll
and Faculty-course. So, in this model, a student can be logically associated with various
instances of Faculties and Courses.
Advantages of Network Model :
 This model is very simple and easy to design like the hierarchical data model.
 This model is capable of handling multiple types of relationships which can help in
modeling real-life applications, for example, 1: 1, 1: M, M: N relationships.
 In this model, we can access the data easily, and also there is a chance that the application
can access the owner’s and the member’s records within a set.
 This network does not allow a member to exist without an owner which leads to the concept
of Data integrity.
 Like a hierarchical model, this model also does not have any database standard,

Disadvantages of Network Model :


 The schema or the structure of this database is very complex in nature as all the records are
maintained by the use of pointers.
 There’s an existence of operational anomalies as there is a use of pointers for navigation
which further leads to complex implementation.
 The design or the structure of this model is not user-friendly.
 This model does not have any scope of automated query optimization.
 This model fails in achieving structural independence even though the network database
model is capable of achieving data independence.

Relational Model
E.F. Codd proposed the relational Model to model data in the form of relations or tables. After
designing the conceptual model of the Database using ER diagram, we need to convert the
conceptual model into a relational model which can be implemented using
any RDBMS language like Oracle SQL, MySQL, etc.
The relational model represents how data is stored in Relational Databases. A relational
database consists of a collection of tables, each of which is assigned a unique name. Consider
a relation STUDENT with attributes ROLL_NO, NAME, ADDRESS, PHONE, and AGE
shown in the table.
Table Student
ROLL_NO NAME ADDRESS PHONE AGE

1 RAMKUMAR DELHI 9455123451 18

2 RAMESHKUMAR CHENNAI 9652431543 18

3 SUJIT SARMA MADURAI 9156253131 20

4 SURESHKUMAR DELHI 9577321941 18

Important Terminologies
Attribute: Attributes are the properties that define an entity. e.g.; ROLL_NO, NAME,
ADDRESS
Relation Schema: A relation schema defines the structure of the relation and represents the
name of the relation with its attributes. e.g.; STUDENT (ROLL_NO, NAME, ADDRESS,
PHONE, and AGE) is the relation schema for STUDENT. If a schema has more than 1 relation,
it is called Relational Schema.
Tuple: Each row in the relation is known as a tuple. The above relation contains 4 tuples, one
of which is shown as:

1 RAMKUMAR DELHI 9455123451 18


Relation Instance: The set of tuples of a relation at a particular instance of time is called a
relation instance. Table 1 shows the relation instance of STUDENT at a particular time. It can
change whenever there is an insertion, deletion, or update in the database.
Degree: The number of attributes in the relation is known as the degree of the relation.
The STUDENT relation defined above has degree 5.
Cardinality: The number of tuples in a relation is known as cardinality.
The STUDENT relation defined above has cardinality 4.
Column: The column represents the set of values for a particular attribute. The
column ROLL_NO is extracted from the relation STUDENT.
ROLL_NO

NULL Values: The value which is not known or unavailable is called a NULL value. It is
represented by blank space. e.g.; PHONE of STUDENT having ROLL_NO 4 is NULL.
Relation Key: These are basically the keys that are used to identify the rows uniquely or also
help in identifying tables. These are of the following types.
Primary Key
Candidate Key
Super Key
Foreign Key
Alternate Key
Composite Key
Constraints in Relational Model
While designing the Relational Model, we define some conditions which must hold for data
present in the database are called Constraints. These constraints are checked before performing
any operation (insertion, deletion, and updation ) in the database. If there is a violation of any
of the constraints, the operation will fail.
Domain Constraints
These are attribute-level constraints. An attribute can only take values that lie inside the domain
range. e.g.; If a constraint AGE>0 is applied to STUDENT relation, inserting a negative value
of AGE will result in failure.
Key Integrity
Every relation in the database should have at least one set of attributes that defines a tuple
uniquely. Those set of attributes is called keys. e.g.; ROLL_NO in STUDENT is key. No two
students can have the same roll number. So a key has two properties:
It should be unique for all tuples.
It can’t have NULL values.
Referential Integrity
When one attribute of a relation can only take values from another attribute of the same relation
or any other relation, it is called referential integrity. Let us suppose we have 2 relations
Table Student
ROLL_NO NAME ADDRESS PHONE AGE BRANCH_CODE

1 RAMKUMAR DELHI 9455123451 18 CS

2 RAMESHKUMAR CHENNAI 9652431543 18 CS

3 SUJIT SARMA MADURAI 9156253131 20 ECE

4 SURESHKUMAR DELHI 9577321941 18 IT

Table Branch
BRANCH_CODE BRANCH_NAME

CS COMPUTER SCIENCE

IT INFORMATION TECHNOLOGY

ECE ELECTRONICS AND COMMUNICATION ENGINEERING

CV CIVIL ENGINEERING

BRANCH_CODE of STUDENT can only take the values which are present in
BRANCH_CODE of BRANCH which is called referential integrity constraint. The relation
which is referencing another relation is called REFERENCING RELATION (STUDENT in
this case) and the relation to which other relations refer is called REFERENCED RELATION
(BRANCH in this case).
Anomalies in the Relational Model
An anomaly is an irregularity or something which deviates from the expected or normal state.
When designing databases, we identify three types of anomalies: Insert, Update, and Delete.
Insertion Anomaly in Referencing Relation
We can’t insert a row in REFERENCING RELATION if referencing attribute’s value is not
present in the referenced attribute value. e.g.; Insertion of a student with BRANCH_CODE
‘ME’ in STUDENT relation will result in an error because ‘ME’ is not present in
BRANCH_CODE of BRANCH.
Deletion/ Updation Anomaly in Referenced Relation:
We can’t delete or update a row from REFERENCED RELATION if the value of
REFERENCED ATTRIBUTE is used in the value of REFERENCING ATTRIBUTE. e.g; if
we try to delete a tuple from BRANCH having BRANCH_CODE ‘CS’, it will result in an error
because ‘CS’ is referenced by BRANCH_CODE of STUDENT, but if we try to delete the row
from BRANCH with BRANCH_CODE CV, it will be deleted as the value is not been used by
referencing relation. It can be handled by the following method:
On Delete Cascade
It will delete the tuples from REFERENCING RELATION if the value used by
REFERENCING ATTRIBUTE is deleted from REFERENCED RELATION. e.g.; For, if we
delete a row from BRANCH with BRANCH_CODE ‘CS’, the rows in STUDENT relation
with BRANCH_CODE CS (ROLL_NO 1 and 2 in this case) will be deleted.
On Update Cascade
It will update the REFERENCING ATTRIBUTE in REFERENCING RELATION if the
attribute value used by REFERENCING ATTRIBUTE is updated in REFERENCED
RELATION. e.g;, if we update a row from BRANCH with BRANCH_CODE ‘CS’ to ‘CSE’,
the rows in STUDENT relation with BRANCH_CODE CS (ROLL_NO 1 and 2 in this case)
will be updated with BRANCH_CODE ‘CSE’.
Super Keys
Any set of attributes that allows us to identify unique rows (tuples) in a given relationship is
known as super keys. Out of these super keys, we can always choose a proper subset among
these that can be used as a primary key. Such keys are known as Candidate keys. If there is a
combination of two or more attributes that are being used as the primary key then we call it a
Composite key.
Codd Rules in Relational Model
Edgar F Codd proposed the relational database model where he stated rules. Now these are
known as Codd’s Rules. For any database to be the perfect one, it has to follow the rules.
For more, refer to Codd Rules in Relational Model.

Advantages of the Relational Model


 Simple model: Relational Model is simple and easy to use in comparison to other
languages.
 Flexible: Relational Model is more flexible than any other relational model present.
 Secure: Relational Model is more secure than any other relational model.
 Data Accuracy: Data is more accurate in the relational data model.
 Data Integrity: The integrity of the data is maintained in the relational model.
 Operations can be Applied Easily: It is better to perform operations in the relational
model.

Disadvantages of the Relational Model


 Relational Database Model is not very good for large databases.
 Sometimes, it becomes difficult to find the relation between tables.
 Because of the complex structure, the response time for queries is high.
 Characteristics of the Relational Model
 Data is represented in rows and columns called relations.
 Data is stored in tables having relationships between them called the Relational model.
 The relational model supports the operations like Data definition, Data manipulation, and
Transaction management.
 Each column has a distinct name and they are representing attributes.
 Each row represents a single entity.

DBTG
The DBTG (Data Base Task Group) model, also known as the CODASYL (Conference on
Data Systems Languages) model, was a database management system (DBMS) model
proposed in the late 1960s and early 1970s. It was an early attempt to standardize database
systems and provided a blueprint for the development of network database systems.

Key characteristics of the DBTG/CODASYL model include:


Network Structure: The DBTG model represents data in a network-like structure, where
records are organized in sets called "networks." Each record can have multiple relationships
with other records, creating a network of interconnected data elements.

Records and Sets: Data in the DBTG model is organized into records, which are collections of
related data items. Records are grouped into sets, which represent entity types or relati

RAMKUMAR DELHI 9455123451

RAMESHKUMAR CHENNAI 9652431543

SUJIT SARMA MADURAI 9156253131

SURESHKUMAR DELHI 9577321941

onships between entity types.

Pointers: The network structure is facilitated by using pointers, which are references between
records. Pointers provide navigation paths to traverse the network from one record to another,
enabling efficient access to related data.

Hierarchical and Non-Hierarchical Relationships: The DBTG model allows both hierarchical
and non-hierarchical (network) relationships between records. This flexibility allows for
complex data relationships to be represented.

Data Manipulation Language (DML): The DBTG model introduced a DML called COBOL-
74, which was based on the COBOL programming language. The DML provided commands
for navigating through the network, inserting, updating, and deleting records.

Data Definition Language (DDL): The DBTG model included a DDL for defining the schema
of the database, specifying record structures, sets, and relationships.

Record Type and Record Occurrence: The model distinguished between record types (entity
types) and record occurrences (actual data instances). Record types define the structure of
records, while record occurrences represent individual data instances.

The DBTG/CODASYL model was widely adopted in the 1970s and early 1980s, especially for
large-scale and complex database applications in areas such as government, finance, and
scientific research. However, as the demand for more flexible and user-friendly database
systems grew, and the relational database model emerged, the DBTG model lost popularity.

The relational model, with its simplicity and declarative querying language (SQL), eventually
became the dominant database model, and most modern databases are based on the relational
paradigm. Nonetheless, the DBTG/CODASYL model played a significant role in the evolution
of database management systems and laid the groundwork for subsequent database models and
technologies.

In the DBTG (CODASYL DBTG) model, data retrieval, processing, and update facilities are
based on a hierarchical approach, as discussed earlier. Additionally, the model also includes
features like "Find," "Get," and "Set" processing for accessing and manipulating data.
Furthermore, the DBTG model allows for mapping the hierarchical network to physical files
on a storage medium. Let's delve into each of these aspects:

Data Retrieval: As mentioned earlier, data retrieval in the DBTG model involves traversing the
hierarchical structure by following relationships between records using pointers or links. Users
can start at the root record and navigate through parent-child relationships to reach the desired
data. Queries can specify search criteria to filter the records.

Find Processing: "Find" processing in the DBTG model is used to search for specific records
that match certain criteria. It allows users to specify conditions to locate records with particular
attribute values or patterns. The "Find" operation returns the first record that satisfies the
specified conditions.

Get Processing: "Get" processing is used to retrieve a specific record based on a given key or
identifier. It is similar to a key-value lookup operation. Users provide a unique identifier, and
the system retrieves the corresponding record.

Set Processing: "Set" processing is used to modify or update data in the database. It allows
users to change the attribute values of a record or perform actions on a group of records that
meet certain criteria.

Update Facility: The DBTG model provides an update facility to handle data modifications.
Users can update records, insert new records, or delete existing records using the data
manipulation language provided by the model.

Mapping Network to Files: In the DBTG model, the hierarchical network is mapped to physical
files on a storage medium (e.g., disk). Each record type corresponds to a file, and each set
within a record type corresponds to a block or group of records within that file. The hierarchical
relationships between records are maintained through pointers or links.

The physical file layout is essential for efficient data retrieval and storage. The DBTG model
incorporates a mapping mechanism to navigate from one record to another using physical file
addresses.

It's important to note that while the DBTG model was an important milestone in database
management history, it has largely been superseded by the relational model and modern
database management systems, which offer more flexible data models and standardized query
languages like SQL. The relational model's ability to handle complex relationships and its
simplicity in data retrieval and manipulation led to its widespread adoption in modern database
systems.
A car rental company wants to create a database system to manage its car rental operations.
The company offers a variety of cars for rent to customers. Each car has unique attributes such
as the License Plate Number, Make, Model, Year, and Daily Rental Price. The company also
maintains customer information and keeps track of rental transactions. Design an ER model for
the car rental company's database system based on the following requirements: Cars are
identified by their unique License Plate Number. Each car has attributes such as Make, Model,
Year, and Daily Rental Price. Customers are identified by their unique Customer ID, and their
information includes Name, Email, and Contact Number. Each customer can rent multiple cars
over time, and each car can be rented by multiple customers. Rental transactions need to be
recorded, including the Rental ID, Rental Date, Return Date, and Total Rental Cost. Each rental
transaction involves one or more cars rented by a single customer. The company also wants to
keep track of any damages or issues reported for each car during the rental period. Draw an ER
diagram to represent the entities and their relationships for the car rental company's database
system.

ER Model:

Based on the scenario, the following entities and their relationships can be identified:

Entities:

Car
License Plate Number (Primary Key)
Make
Model
Year
Daily Rental Price

Customer
Customer ID (Primary Key)
Name
Email
Contact Number

Rental
Rental ID (Primary Key)
Rental Date
Return Date
Total Rental Cost

Rental Details
Rental Detail ID (Primary Key)
Rental ID (Foreign Key referencing Rental)
License Plate Number (Foreign Key referencing Car)
Issue Description (to record any damages or issues reported for the car)

Relationships:
Car and Rental Details: One-to-Many relationship (Each car can be part of multiple rental
transactions with different rental details).
Customer and Rental: One-to-Many relationship (Each customer can have multiple rental
transactions).

Car and Rental Details: One-to-Many relationship (Each car can have multiple rental details
recorded for different rental transactions).

Rental and Rental Details: One-to-Many relationship (Each rental transaction can have multiple
rental details for multiple cars).

With this ER model, the car rental company can efficiently manage its car inventory, customer
information, rental transactions, and any issues reported during the rental period in their
database system.

You might also like