Dbms Module 1 (Mmc103)
Dbms Module 1 (Mmc103)
Databases and database technology have a major impact on the growing use ofcomputers. It is
fair to say that databases play a critical role in almost all areas wherecomputers are used,
including business, electronic commerce, engineering, medicine, genetics, law, education, and
library science.
Database
A database is a collection of related data.1 Bydata,we mean known facts that can berecorded and
that have implicit meaning. For example, consider the names, telephone numbers, and addresses
of the people you know.
It represents some aspect of the real world, sometimes called the mini world or the universe of
discourse (UOD). Changes to the mini world are reflected in the database.
It is designed, built, and populated with data for a specific purpose. It has an intended group of
users and some preconceived applications in which these users are interested.
To summarize: a database has some source (i.e., the mini world) from which data are derived,
some degree of interaction with events in the represented mini world and an audience that is
interested in using it.
Databases touch all aspects of our lives. Some of the major areas of application are as
follows:
1. Banking
2. Airlines
3. Universities
4. Manufacturing and selling
5. Human resources
Data Independence and Efficiency: DBMS ensures that changes in the database schema
don’t affect applications, enabling efficient data use and flexibility.
Uniform Data Administration: Centralized control over data ensures consistency and a
standardized approach to data management.
Data Integrity and Security: A DBMS maintains data accuracy through constraints and
protects data with security features like authentication.
Concurrent Data Access and Recovery: Supports multiple users accessing data
simultaneously and provides mechanisms for data recovery in case of crashes.
User-Friendly Query Language: DBMS uses simple, declarative languages like SQL for easy
data interaction and management.
Advantages of DBMS
Efficient Data Access: By utilizing advanced storage and retrieval techniques, a DBMS
ensures that data access is efficient, especially when stored on external storage systems,
improving overall performance.
Data Integrity and Security: DBMS enforces integrity constraints and ensures data security,
protecting the data from inconsistencies and unauthorized access.
Data Administration: For shared data, a DBMS enables efficient data administration.
Professionals manage data representation, reduce redundancy, and optimize data retrieval,
ensuring better organization and performance.
Definition: This is the highest level of abstraction in a DBMS, and it defines how data
is seen by individual users or user applications. It provides a tailored view of the
database, often showing only a subset of the data or presenting it in a way that is
meaningful to specific users.
Example: A customer may have access to their own profile and transaction history but
not to other customers' information. Similarly, an administrator might have a
comprehensive view of the entire database.
Definition: This level describes the logical structure of the entire database, including
all the data entities, their relationships, and the constraints applied to them, but without
detailing how the data is stored physically.
Example: A logical view may describe a database that contains customers, orders, and
products, along with relationships such as "customers place orders" or "orders contain
products."
Definition: The physical level is the lowest level of abstraction and details how data is
stored physically in the database system. This includes the data's physical storage
locations, indexing methods, file structures, and how the DBMS manages data storage
and retrieval at the disk or hardware level.
Example: The internal view would specify that customer data is stored in a certain file,
indexed by customer ID, and that transactions are stored using a particular file format or
technique to speed up retrieval.
Data models in a Database Management System (DBMS) define how data is structured,
stored, and manipulated. They provide a framework for organizing and managing data and are
fundamental in database design. Different data models offer different ways to represent
relationships and organize data based on the requirements of the system.
1. Hierarchical Data Model:This model organizes data in a tree-like structure, where each
record (node) has a single parent and can have many children. It represents data using a "parent-
child" relationship.
2. Network Data Model: The network model organizes data using a graph structure, where
records (nodes) can have multiple parent and child relationships (many-to-many). Data is
represented as a collection of records connected by relationships.
Example: A university system where a student (node) can enroll in multiple courses (many-to-
many relationship), and each course can have multiple students.
3. Relational Data Model: Data is organized into tables (called relations), where each table
contains rows (tuples) and columns (attributes). Tables are related using keys (primary and
foreign keys). The relational model uses Structured Query Language (SQL) to query the data.
Example: A sales database with tables for Customers, Orders, and Products, where Orders
refers to Customers using a foreign key.
Example: In a university system, a Student object could have attributes like name and grade,
along with methods to calculate GPA or register for courses.
5. Document Data Model: This model stores data as documents, usually in formats like
JSON, XML, or BSON. Each document can contain nested structures (arrays, sub-documents),
making it flexible for storing semi-structured data.
Example: A blog application where each post is stored as a document containing title, content,
and comments.
6. Graph Data Model: Data is represented as a graph, with nodes (entities) and edges
(relationships between entities). This model is ideal for representing complex relationships
between data points.
Example: A social network where each user is a node, and relationships like friends or
followers are represented as edges between the nodes.
1. Database Engine
The Database Engine is the core component of the DBMS. It is responsible for managing how
data is stored, retrieved, and updated. The engine handles the low-level operations like reading
and writing data to physical storage (e.g., hard drives), executing queries, and ensuring data is
processed efficiently.
2. Database Schema
The Database Schema defines the structure of the database. It acts like a blueprint that describes
how the data is organized and how different parts of the database relate to each other. This
includes creating tables, defining relationships, and setting rules like constraints to maintain data
integrity.
3. Query Processor
The Query Processor is responsible for interpreting and executing the commands given by
users. It takes the queries written in languages like SQL, breaks them down, optimizes them for
performance, and then runs the query to fetch or modify data. It ensures that the database
responds to requests quickly and accurately.
4. Transaction Management
The Transaction Management component ensures that all operations on the database are
performed in a reliable and consistent manner. It ensures that data transactions follow the ACID
properties (Atomicity, Consistency, Isolation, Durability), meaning that either all parts of a
transaction are completed successfully, or none are applied, preventing data corruption.
The Data Dictionary stores important information about the database structure, such as table
names, column definitions, data types, and user permissions. It helps the DBMS understand the
database's layout and serves as a reference guide for the system to manage and interact with the
data properly.
6. Storage Management
The Storage Management component deals with how data is physically stored in the database.
It is responsible for managing disk space, organizing files, and making sure data is stored
efficiently. It also handles indexing, which makes searching and retrieving data faster.
The Backup and Recovery component ensures that the data is protected against loss or
corruption. It regularly backs up data and provides recovery mechanisms in case of system
failures, ensuring that the database can be restored to a consistent and safe state after any issues.
8. Concurrency Control
The Concurrency Control component manages multiple users accessing the database at the
same time. It ensures that transactions are processed in isolation to prevent conflicts or
inconsistencies, such as two users trying to update the same data simultaneously. This ensures
data integrity even in multi-user environments.
9. Security Management
The Security Management component protects the database from unauthorized access. It
ensures that only authorized users can access or modify the data. This is achieved by managing
user authentication, defining access levels, and encrypting sensitive data to keep it safe.
The User Interface is the way users interact with the DBMS. It can be a command-line interface
(CLI), a graphical user interface (GUI), or an application programming interface (API). The
interface allows users to perform tasks like querying data, entering new records, and managing
the database without needing to understand the technical details.
The E/R model was introduced to provide a clear and simple way of representing real-world
entities and their interrelationships. It serves as a blueprint for designing a database before it is
implemented in a specific database system.
2. Entities
An entity represents a distinct object or thing within the domain of interest that can be identified
uniquely. Entities are typically nouns like "Student," "Employee," or "Course," and they
correspond to real-world objects or concepts that need to be stored in a database.
3. Entity Types
An entity type is a collection of similar entities that share the same properties or attributes. Each
entity within an entity type has specific characteristics that define it, but the key is that all entities
of the same type are conceptually identical.
Example: "Student" is an entity type that includes many individual student entities, each
of which has attributes such as "StudentID" "Name" and "Age."
Kinds of Entity:
Tangible Entity: It is an entity in DBMS, which is a physical object that we can touch or see. In
simple words, an entity that has a physical existence in the real world is called a tangible entity.
For example, in a database, a table represents a tangible entity because it contains a physical
object that we can see and touch in the real world. It includes colleges, bank lockers, mobiles,
cars, watches, pens, paintings, etc.
Intangible Entity:It is an entity in DBMS, which is a non-physical object that we cannot see or
touch. In simple words, an entity that does not have any physical existence in the real world is
known as an intangible entity. For example, a bank account logically exists, but we cannot see
or touch it.
Strong Entity: A strong entity is an entity that can be uniquely identified by its own attributes,
meaning it does not depend on any other entity for its identification. A strong entity typically has
a primary key that uniquely identifies each instance of that entity type.
Key Characteristics:
Unique Identification: A strong entity has a primary key that uniquely identifies each
instance of the entity.
No Dependence: Strong entities do not rely on any other entity to be identified. They are
self-sufficient.
Weak Entity: A weak entity is an entity that cannot be uniquely identified by its own
attributes alone. It depends on another entity (called the strong entity) for its identification. A
weak entity has a partial key, which can uniquely identify it only when combined with the
identifier of the strong entity.
Key Characteristics:
Dependence: A weak entity depends on a strong entity for its identification. It cannot be
uniquely identified without the strong entity’s key.
No Primary Key: A weak entity does not have a primary key. Instead, it uses a partial
key, which is a set of attributes that, when combined with the key of the strong entity,
uniquely identify an instance.
4. Attributes
Attributes are the properties or characteristics of an entity that help to describe it in more detail.
For example, a "Student" entity could have attributes like "Name," "StudentID," and "Email."
Types of Attributes:
5. Relationships
A relationship describes the association between two or more entities. It represents how entities
are related in the real world. For example, a "Student" may be related to a "Course" through the
relationship "EnrolledIn."
6. Relationship Types:
In the Entity-Relationship (E/R) Model, relationships describe how entities are related to one
another. Relationships are fundamental in defining how data is interconnected within a database.
The relationship type refers to the nature of the association between entities. Depending on the
number of entities involved and how they are related, relationships are classified into different
types.
Types of Relationships
2. Binary Relationship
3. Ternary Relationship
4. N-ary Relationship
An N-ary relationship involves more than three entities. These relationships are
less common and can involve multiple entities in a complex association.
Example: In a large university database, a "Course" is taught by a "Professor",
and multiple "Students" are enrolled in that "Course". This could be represented
as an N-ary relationship involving "Course", "Professor", and "Student".
Relationship: "Course OfferedBy Professor EnrolledBy Student"
In this case, we have more than three entities participating in the
relationship.
Cardinality of Relationships
The cardinality of a relationship defines how many instances of one entity can be associated
with instances of another entity. It determines the number of occurrences in one entity that can or
must be associated with a single occurrence of the other entity.
1. One-to-One (1:1): One instance of Entity A is associated with exactly one instance of
Entity B, and vice versa.
Example: A "Person" has one "Passport", and each "Passport" is assigned to one
"Person".
3. Many-to-One (N:1): Many instances of Entity A are associated with one instance of
Entity B, and each instance of Entity B is associated with many instances of Entity A.
This is the reverse of the "One-to-Many" relationship.
Example: Many "Employees" work in a "Department", but each "Department" can have
multiple "Employees".
4. Many-to-Many (M:N): Many instances of Entity A are associated with many instances
of Entity B.
Example: A "Student" can enroll in many "Courses", and a "Course" can have many
"Students".
Advantages
understood by non-technical
technical specialist.
Disadvantages
Components of an ER Diagram:
Entities (tables): Represented by rectangles.
Attributes (columns): Represe
Represented by ovals.
Primary Keys:: Represented by underlined attributes.
Foreign Keys:: Represented by dashed lines connecting the attribute to the referenced
table.
Relationships:: Represented by diamonds that connects tables/entities.
Cardinality:: Indicates the ttype of relationship (e.g., one-to-one, one-to-many,
many, many-to-
many
many).
Entity Sets:
An entity set is a collection of similar types of entities that share common properties or
attributes. Each entity in the set represents a real-world object or concept that can be distinctly
identified.
Example: In a database for a Library Management System, the Book entity set would
contain all the books in the library. Each individual book in the set represents a unique
object with attributes like Book_ID, Title, Author, and Publication Year.
Relationship Sets:
A relationship set is a collection of relationships of the same type. It includes all the instances of
a specific relationship between entities in the database.
Example: If there are multiple records of members borrowing books, each specific
instance of a borrowing event (e.g., Member1 borrows Book1) is a relationship instance.
All such instances form the relationship set for Borrows.
ER Design Issues
Entity-Relationship (ER) design is a crucial step in database modeling, as it lays the foundation
for how data will be organized, represented, and interrelated. However, ER design comes with
several challenges and issues that need to be addressed to ensure that the final schema is
efficient, logical, and accurately represents real-world relationships .
2. Identifying Relationships
Weak Entities are entities that cannot be uniquely identified by their own attributes
alone. They often depend on a strong entity (an entity that has a primary key) for their
identification. A weak entity must always have a relationship with a strong entity,
typically represented by a partial key.
o Key issues:
Identifying weak entities: Recognizing which entities depend on others
for identification can be tricky.
Defining appropriate relationships: A weak entity must be linked to a
strong entity via a identifying relationship.
4. Resolving Redundancy
Redundancy occurs when the same data is repeated in different parts of the database. In
ER design, redundancy leads to increased storage costs, decreased performance, and data
anomalies (e.g., inconsistent data updates).
o Key issues:
Data Duplication: Storing the same information multiple times across
different tables.
Normalization Problems: Failing to normalize data appropriately,
leading to redundant or inconsistent data.
Derived Attributes are attributes that can be calculated from other attributes. For
example, the Age of a Person can be derived from their Date_of_Birth. These should not
be stored in the database but calculated when needed.
o Key issues:
Unnecessary storage: Storing derived attributes unnecessarily wastes
space and creates inconsistencies if the original data changes.
Performance Concerns: In some cases, calculating derived attributes can
negatively affect performance, especially when done frequently.
Integrity Constraints ensure the correctness and validity of data in the database. They
define the rules that data must follow, such as primary keys, foreign keys, unique
constraints, and check constraints.
o Key issues:
Enforcing Constraints: Ensuring that constraints are correctly
represented in the ER diagram and later in the schema.
Data Anomalies: Preventing violations of integrity constraints that could
lead to data anomalies (e.g., orphaned records due to improper foreign key
constraints).
2. Data Models
2.1 Introduction to the Relational Mode
Model
E.F. Codd proposed the relational Model to model data in the form of relations or tables. After
designing the conceptual model of the Database using ER diagram, we need to convert the
conceptual model into a relational model which can be implemented using any RDBMS
language like Oracle SQL, MySQL, etc. So we will see what the Relational Model is.
The relational model uses a collection of tables to represent both data and the relationships
among those data. Each table has multiple columns, and each column has a unique name. Tables
are also known as relations. The relational model is an example of a record-based
record model.
Record-based models are re so named because the database is structured in fixed
fixed-format
format records of
several types. Each table contains records of a particular type. Each record type defines a fixed
number of fields, or attributes. The columns of the table correspond to the attributes
attribut of the record
type. The relational data model is the most widely used data model, and a vast majority of
current database systems are based on the relational model.
1. Domain Constraints
Definition: Domain constraints specify that the values of an attribute (column) must be
from a specific set or domain of possible values.
Example: The attribute "Age" in a "Student" table must be a positive integer.
Purpose: Ensure that the data inserted into a column conforms to the defined data type
and set of valid values (such as integer, string, date, etc.).
Definition: The entity integrity constraint ensures that each row in a table has a unique
identity. This is enforced by the primary key.
Key Points:
o Each table must have a primary key.
o The primary key cannot contain NULL values.
Example: In a "Student" table, the StudentID is the primary key, and no two students
can have the same StudentID. Additionally, StudentID cannot be NULL for any student
record.
Purpose: Guarantees that each record can be uniquely identified.
Definition: The referential integrity constraint ensures that relationships between tables
are maintained. It is enforced using foreign keys.
Key Points:
o A foreign key in one table must refer to a primary key in another table.
o A foreign key can either point to an existing record or be NULL (in some cases).
o If a record in the referenced table is deleted or updated, referential integrity must
be maintained (using actions like cascade or restrict).
Example: In a "Course" table, the "StudentID" attribute might be a foreign key that refers
to the "StudentID" primary key in the "Student" table.
Purpose: Ensures that references between tables are consistent and valid.
4. Key Constraints
Definition: A key constraint specifies that each table must have a key (primary key) that
uniquely identifies each row. Additional keys are candidate keys or alternate keys,
which could also uniquely identify a record but are not chosen as the primary key.
Key Points:
o The primary key is a candidate key selected to uniquely identify records in a
table.
o Other unique sets of attributes that can also uniquely identify records are called
candidate keys.
Example: In the "Employee" table, "EmployeeID" could be the primary key, but a
combination of "FirstName" and "LastName" might also serve as a candidate key.
Purpose: Ensures that rows are uniquely identified by key attributes.
Definition: The not null constraint ensures that a column cannot have NULL values.
Key Points:
o A NOT NULL constraint is applied to an attribute to guarantee that every record
must have a value for that column.
Example: The "EmployeeID" column in the "Employee" table may have a NOT NULL
constraint to ensure every employee record has a valid ID.
Purpose: Prevents NULL values in critical columns, ensuring that data is always
provided.
A database schema is a logical representation of data that shows how the data in a
database should be stored logically. It shows how the data is organized and the
relationship between the tables.
Database schema contains table, field, views and relation between different keys
like primary key, foreign key.
Data are stored in the form of files which is unstructured in nature which makes
accessing the data difficult. Thus to resolve the issue the data are organized in structured
way with the help of database schema.
Database schema provides the organization of data and the relationship between the
stored data.
Database schema defines a set of guidelines that control the database along with that it
provides information about the way of accessing and modifying the data.
A Physical schema defines, how the data or information is stored physically in the
storage systems in the form of files & indices. This is the actual code or syntax needed to
create the structure of a database, we can say that when we design a database at a
physical level, it’s called physical schema.
The Database administrator chooses where and how to store the data in the different
blocks of storage.
A logical database schema defines all the logical constraints that need to be applied to the
stored data, and also describes tables, views, entity relationships, and integrity
constraints.
The Logical schema describes how the data is stored in the form of tables & how the
attributes of a table are connected.
Using ER modeling the relationship between the components of the data is maintained.
In logical schema different integrity constraints are defined in order to maintain the
quality of insertion and update the data.
It is a view level design which is able to define the interaction between end-user and
database.
User is able to interact with the database with the help of the interface without knowing
much about the stored mechanism of data in database.
Keys:
In a Database Management System (DBMS), keys are crucial for ensuring data integrity,
uniqueness, and the proper structure of relationships among tables. Here's a detailed explanation
of the primary key, foreign key, super key, and candidate key:
Example Tables:
Customers Table: Customer_ID (Primary Key) Name Email
1 Alice [email protected]
2 Bob [email protected]
3 Carol [email protected]
Orders Table:
Order_ID Customer_ID(Foreign Key) Order_Date
101 1 2024-12-01
102 2 2024-12-02
103 1 2024-12-03
1. Primary Key:
Definition: The primary key is a unique identifier for a record in a table. It cannot contain NULL
values, and each value must be unique within the table.
In our Example:In the Customers table, Customer_ID is the primary key because each
customer has a unique ID that uniquely identifies them in the table.
2. Foreign Key:
Definition: A foreign key is a field in a table that links to the primary key in another table. It is
used to establish relationships between tables and ensures referential integrity.
In our Example:In the Orders table, Customer_ID is a foreign key that refers to the
Customer_ID in the Customers table. It links each order to a customer.
3. Super Key:
Definition: A super key is any set of one or more attributes that can uniquely identify a record
in a table. It might contain extra, unnecessary attributes beyond what is required for uniqueness.
In our Example:In the Customers table, both the combination of Customer_ID and Name could
serve as a super key, but it’s not minimal because Customer_ID alone is enough to uniquely
identify a record. Similarly, in the Orders table, the combination of Order_ID and Customer_ID
could be a super key, but Order_ID alone is sufficient.
4. Candidate Key:
Definition: A candidate key is a minimal super key, meaning it’s a set of attributes that
can uniquely identify a record in a table, and no attribute can be removed without losing
uniqueness.
In our Example:
o In the Customers table, Customer_ID is a candidate key because it uniquely
identifies each customer, and removing it would violate uniqueness.
o In the Orders table, Order_ID is a candidate key because it uniquely identifies
each order, and removing it would cause ambiguity.
Schema Diagrams
A database schema, along with primary key and foreign key dependencies, can be depicted by
schema diagrams. Figure 1.12 shows the schema diagram for our university organization. Each
relation appears as a box, with the relation name at the top in blue, an
andd the attributes listed inside
the box. Primary key attributes are shown underlined. Foreign key dependencies appear as
arrows from the foreign key attributes of the referencing relation to the primary key of the
referenced relation.
Relational operations are the fundamental operations used to manipulate and retrieve data from
relational databases. They are based on set theory and work on relations (tables) to produce new
relations as results. These operations are central to querying and modifying relational databases.
commonly used relational operations:
1. Selection (σ)
Definition: The selection operation filters rows from a relation (table) based on a
specified condition or predicate.
Result: It returns a subset of the rows that satisfy the condition.
Syntax: SELECT * FROM table WHERE condition;
Set Notation: σcondition(Relation)\sigma_{condition}(Relation)σcondition(Relation)
2. Projection (π)
3. Union (∪)
Definition: The union operation combines the results of two queries. It combines all
unique rows from two relations and eliminates duplicates. The relations must have the
same number of columns and compatible data types.
Syntax: SELECT * FROM table1 UNION SELECT * FROM table2;
Set Notation: Relation1∪Relation2
4. Difference (−)
Definition: The difference operation returns the rows from the first relation that are not
present in the second relation. Like the union, the relations must have the same number of
columns and compatible data types.
Syntax:SELECT * FROM table1 EXCEPT SELECT * FROM table2;
Set Notation: Relation1−Relation2
Definition: The cartesian product operation combines each row from one relation with
each row from another relation. It results in all possible combinations of rows from the
two relations. It is often used in joins.
Syntax: SELECT * FROM table1, table2;
Set Notation: Relation1×Relation2
6. Join (⨝)
Definition: The join operation is a combination of the cartesian product and selection. It
combines rows from two relations based on a condition (usually matching values in
related columns). There are different types of joins, including inner join, left join, right
join, and full outer join.
Syntax (Inner Join):SELECT * FROM table1 JOIN table2 ON table1.common_column =
table2.common_column;
Set Notation: Relation1⋈Relation2
7. Division (÷)
Definition: The division operation is used when we want to find rows in one relation that
are associated with all rows in another relation. It's often used for queries that involve
"for all" conditions.
8. Renaming (ρ)