Database Management Systems
Database Management Systems
TERM 2008-09
Slide No:L1-1
What Is a DBMS?
Slide No:L1-2
Why Use a DBMS?
Slide No:L1-3
Why Study Databases?? ?
Slide No:L1-4
Files vs. DBMS
Slide No:L1-5
Purpose of Database Systems
• In the early days, database applications were built
directly on top of file systems
• Drawbacks of using file systems to store data:
– Data redundancy and inconsistency
• Multiple file formats, duplication of information in
different files
– Difficulty in accessing data
• Need to write a new program to carry out each new
task
– Data isolation — multiple files and formats
– Integrity problems
• Integrity constraints (e.g. account balance > 0)
become “buried” in program code rather than being
stated explicitly
• Hard to add new constraints or change existing
ones
Slide No:L1-6
Purpose of Database Systems (Cont.)
Slide No:L1-7
Levels of Abstraction
Slide No:L1-8
Summary
• DBMS used to maintain, query large datasets.
• Benefits include recovery from system crashes,
concurrent access, quick application development,
data integrity and security.
• Levels of abstraction give data independence.
• A DBMS typically has a layered architecture.
• DBAs hold responsible jobs
and are well-paid!
• DBMS R&D is one of the broadest,
most exciting areas in CS.
Slide No:L1-9
View of Data
Slide No:L2-1
Instances and Schemas
Slide No:L2-3
Data Models
Slide No:L2-4
Data Models
Slide No:L2-5
Example: University Database
• Conceptual schema:
– Students(sid: string, name: string, login: string,
• Physical schema:
– Relations stored as unordered files.
– Index on first column of Students.
Slide No:L2-6
Data Independence
• Applications insulated from how data
is structured and stored.
• Logical data independence: Protection
from changes in logical structure of
data.
• Physical data independence:
Protection from changes in physical
structure of data.
One of the most important benefits of using a DBMS!
Slide No:L2-7
DATA BASE LANGUAGE
Data Manipulation Language (DML)
Slide No:L3-1
Data Definition Language (DDL)
• Specification notation for defining the database schema
Example: create table account (
account_number char(10),
branch_name char(10),
balance integer)
• DDL compiler generates a set of tables stored in a data
dictionary
• Data dictionary contains metadata (i.e., data about data)
– Database schema
– Data storage and definition language
• Specifies the storage structure and access methods used
– Integrity constraints
• Domain constraints
• Referential integrity (e.g. branch_name must correspond
to a valid branch in the branch table)
– Authorization
Slide No:L3-2
Relational Model
Slide No:L3-3
A Sample Relational Database
Slide No:L3-4
SQL
• SQL: widely used non-procedural language
– Example: Find the name of the customer with
customer-id 192-83-7465
select customer.customer_name
from customer
where customer.customer_id = ‘192-83-7465’
– Example: Find the balances of all accounts held by
the customer with customer-id 192-83-7465
select account.balance
from depositor, account
where depositor.customer_id = ‘192-83-7465’
and
depositor.account_number =
account.account_number
Slide No:L3-5
SQL
Slide No:L3-6
Database Users
Slide No:L4-1
Database Administrator
Slide No:L4-2
Data storage and Querying
• Storage management
• Query processing
• Transaction processing
Slide No:L5-1
Storage Management
Slide No:L5-2
Query Processing
Slide No:L5-3
Query Processing (Cont.)
Slide No:L5-4
Transaction Management
Slide No:L5-5
Database Architecture
Slide No:L6-1
Overall System Structure
Slide No:L6-2
Database Application Architectures
(web browser)
Old Modern
Slide No:L6-3
Slide No:L1-6
Slide No:L1-7
Slide No:L1-8
Slide No:L1-9
Slide No:L1-10
Database Design
Slide No:L2-1
Modeling
• A database can be modeled as:
– a collection of entities,
– relationship among entities.
• An entity is an object that exists and is
distinguishable from other objects.
– Example: specific person, company, event, plant
• Entities have attributes
– Example: people have names and addresses
• An entity set is a set of entities of the same type
that share the same properties.
– Example: set of all persons, companies, trees,
holidays
Slide No:L2-2
Entity Sets customer and loan
customer_id customer_ customer_ customer_ loan_ amount
name street city number
Slide No:L2-3
Attributes
• An entity is represented by a set of attributes, that is
descriptive properties possessed by all members of an
entity set.
Example:
customer = (customer_id, customer_name,
customer_street, customer_city )
loan = (loan_number, amount )
• Domain – the set of permitted values for each attribute
• Attribute types:
– Simple and composite attributes.
– Single-valued and multi-valued attributes
• Example: multivalued attribute: phone_numbers
– Derived attributes
• Can be computed from other attributes
• Example: age, given date_of_birth
Slide No:L2-4
Composite Attributes
Slide No:L2-5
Mapping Cardinality Constraints
Slide No:L2-6
Mapping Cardinalities
Slide No:L2-7
Mapping Cardinalities
Slide No:L2-8
ER Model Basics
name
ssn lot
Employees
ssn lot
since
name dname
ssn lot did budget Employees
super- subord
Employees Works_In Departments visor inate
Reports_To
Slide No:L2-10
Relationship Sets
• A relationship is an association among several
entities
Example:
Hayes depositor A-102
customer entityrelationship setaccount entity
• A relationship set is a mathematical relation among
n 2 entities, each taken from entity sets
{(e1, e2, … en) | e1 E1, e2 E2, …, en En}
Slide No:L3-1
Relationship Set borrower
Slide No:L3-2
Relationship Sets (Cont.)
• An attribute can also be property of a relationship set.
• For instance, the depositor relationship set between entity sets
customer and account may have the attribute access-date
Slide No:L3-3
Degree of a Relationship Set
Slide No:L3-4
Degree of a Relationship Set
Example: Suppose employees of a bank
may have jobs (responsibilities) at
multiple branches, with different jobs at
different branches. Then there is a
ternary relationship set between entity
sets employee, job, and branch
• Relationships between more than two entity sets
are rare. Most relationships are binary. (More on
this later.)
Slide No:L3-5
Additional
since
features of the ER name dname
model
ssn lot did budget
Key Constraints
Employees Manages Departments
• Consider Works_In:
An employee can
work in many
departments; a dept
can have many
employees.
• In contrast, each dept
has at most one
manager, according
to the key
constraint on 1-to-1 1-to Many Many-to-1 Many-to-Many
Manages.
Slide No:L4-1
Participation Constraints
• Does every department have a manager?
– If so, this is a participation constraint: the participation
of Departments in Manages is said to be total (vs.
partial).
• Every Departments entity must appear in an instance
of the Manages relationship.
since
name dname
ssn lot did budget
Works_In
since
Slide No:L4-2
Weak Entities
• A weak entity can be identified uniquely only by considering the
primary key of another (owner) entity.
– Owner entity set and weak entity set must participate in a
one-to-many relationship set (one owner, many weak entities).
– Weak entity set must have total participation in this
identifying relationship set.
name
cost pname age
ssn lot
Slide No:L4-3
Weak Entity Sets
• An entity set that does not have a primary key is referred to
as a weak entity set.
• The existence of a weak entity set depends on the existence of
a identifying entity set
– it must relate to the identifying entity set via a total, one-
to-many relationship set from the identifying to the weak
entity set
– Identifying relationship depicted using a double diamond
• The discriminator (or partial key) of a weak entity set is the
set of attributes that distinguishes among all the entities of a
weak entity set.
• The primary key of a weak entity set is formed by the primary
key of the strong entity set on which the weak entity set is
existence dependent, plus the weak entity set’s discriminator.
Slide No:L4-4
Weak Entity Sets (Cont.)
• We depict a weak entity set by double rectangles.
• We underline the discriminator of a weak entity set with a
dashed line.
• payment_number – discriminator of the payment entity set
• Primary key for payment – (loan_number, payment_number)
Slide No:L4-5
Weak Entity Sets (Cont.)
Slide No:L4-6
More Weak Entity Set Examples
Slide No:L4-7
ISA (`is a’) Hierarchies
name
As in C++, or other PLs, ssn lot
Slide No:L5-1
Aggregation
• Used when we have to name
ssn lot
model a relationship
involving (entitity sets Employees
and) a relationship set.
– Aggregation allows
us to treat a Monitors until
relationship set as
an entity set for
purposes of started_on since
dname
participation in pid pbudget did
(other) budget
relationships. Projects Sponsors Departments
employee.
Slide No:L5-2
Aggregation
Consider the ternary relationship works_on, which we
saw earlier
Suppose we want to record managers for tasks
performed by an employee at a branch
Slide No:L5-3
Aggregation (Cont.)
• Relationship sets works_on and manages represent
overlapping information
– Every manages relationship corresponds to a works_on
relationship
– However, some works_on relationships may not
correspond to any manages relationships
• So we can’t discard the works_on relationship
• Eliminate this redundancy via aggregation
– Treat relationship as an abstract entity
– Allows relationships between relationships
– Abstraction of relationship into new entity
Slide No:L5-4
Aggregation (Cont.)
• Eliminate this redundancy via aggregation
– Treat relationship as an abstract entity
– Allows relationships between relationships
– Abstraction of relationship into new entity
• Without introducing redundancy, the following diagram
represents:
– An employee works on a particular job at a particular
branch
– An employee, branch, job combination may have an
associated manager
Slide No:L5-5
E-R Diagram With Aggregation
Slide No:L5-6
Conceptual Design Using the ER Model
• Design choices:
– Should a concept be modeled as an entity or an
attribute?
– Should a concept be modeled as an entity or a
relationship?
– Identifying relationships: Binary or ternary?
Aggregation?
• Constraints in the ER Model:
– A lot of data semantics can (and should) be captured.
– But some constraints cannot be captured in ER
diagrams.
Slide No:L6-1
Entity vs. Attribute
• Should address be an attribute of Employees or an entity
(connected to Employees by a relationship)?
• Depends upon the use we want to make of address
information, and the semantics of the data:
• If we have several addresses per employee, address
must be an entity (since attributes cannot be set-
valued).
• If the structure (city, street, etc.) is important, e.g.,
we want to retrieve employees in a given city,
address must be modeled as an entity (since
attribute values are atomic).
Slide No:L6-2
Entity vs. Attribute (Contd.)
• Works_In4 does not
allow an employee to
from to
work in a department name dname
for two or more ssn lot did budget
periods.
• Similar to the problem Employees Works_In4 Departments
of wanting to record
several addresses for
an employee: We
want to record several
values of the
descriptive attributes name dname
ssn lot did
for each instance of budget
this relationship. Works_In4 Departments
Employees
Accomplished by
introducing new entity
set, Duration. from Duration to
Slide No:L6-3
Entity vs. Relationship
• First ER diagram OK if a
manager gets a separate since dbudget
name dname
discretionary budget for ssn lot did budget
each dept.
• What if a manager gets a Employees Manages2 Departments
discretionary budget
that covers all managed
depts? name
ssn lot
– Redundancy: dbudget
since dname
stored for each dept did
Employees budget
managed by manager.
– Misleading: Suggests
Manages2 Departments
dbudget associated ISA
with department-mgr
combination. This fixes the
Managers dbudget
problem!
Slide No:L6-4
Binary vs. Ternary Relationships
Slide No:L6-5
policyid cost
Binary vs. Ternary Relationships (Contd.)
Slide No:L6-6
Summary of Conceptual Design
Slide No:L7-1
Summary of ER (Contd.)
Slide No:L7-2
Summary of ER (Contd.)
Slide No:L7-3
Views
Slide No:L5-1
Views and Security
Slide No:L5-2
View Definition
• A relation that is not of the conceptual model but is
made visible to a user as a “virtual relation” is called
a view.
• A view is defined using the create view statement
which has the form
Slide No:L5-3
Example Queries
• A view consisting of branches and their customers
create view all_customer as
(select branch_name, customer_name
from depositor, account
where depositor.account_number =
account.account_number )
union
(select branch_name, customer_name
from borrower, loan
where borrower.loan_number = loan.loan_number )
select customer_name
from all_customer
where branch_name = 'Perryridge'
Slide No:L5-4
Uses of Views
• Hiding some information from some users
– Consider a user who needs to know a customer’s name,
loan number and branch name, but has no need to see the
loan amount.
– Define a view
(create view cust_loan_data as
select customer_name, borrower.loan_number,
branch_name
from borrower, loan
where borrower.loan_number = loan.loan_number )
– Grant the user permission to read cust_loan_data, but not
borrower or loan
• Predefined queries to make writing of other queries easier
– Common example: Aggregate queries used for statistical
analysis of data
Slide No:L5-5
Processing of Views
• When a view is created
– the query expression is stored in the database along with the
view name
– the expression is substituted into any query using the view
• Views definitions containing views
– One view may be used in the expression defining another view
– A view relation v1 is said to depend directly on a view relation
v2 if v2 is used in the expression defining v1
– A view relation v1 is said to depend on view relation v2 if either
v1 depends directly to v2 or there is a path of dependencies
from v1 to v2
– A view relation v is said to be recursive if it depends on itself.
Slide No:L5-6
View Expansion
• A way to define the meaning of views defined in terms of
other views.
• Let view v1 be defined by an expression e1 that may itself
contain uses of view relations.
• View expansion of an expression repeats the following
replacement step:
repeat
Find any view relation vi in e1
Replace the view relation vi by the expression defining vi
until no more view relations are present in e1
• As long as the view definitions are not recursive, this loop
will terminate
Slide No:L5-7
With Clause
• The with clause provides a way of defining a
temporary view whose definition is available only to
the query in which the with clause occurs.
• Find all accounts with the maximum balance
Slide No:L5-8
Complex Queries using With Clause
• Find all branches where the total account deposit is greater
than the average of the total account deposits at all branches.
Slide No:L5-10
Formal Relational Query Languages
Slide No:L6-1
Preliminaries
Slide No:L6-2
Example Instances
R1 sid bid day
22 101 10/10/96
58 103 11/12/96
• “Sailors” and “Reserves”
relations for our examples. sid sname rating age
• We’ll use positional or S1
named field notation, 22 dustin 7 45.0
assume that names of
fields in query results are 31 lubber 8 55.5
`inherited’ from names of 58 rusty 10 35.0
fields in query input
relations. sid sname rating age
S2
28 yuppy 9 35.0
31 lubber 8 55.5
44 guppy 5 35.0
58 rusty 10 35.0
Slide No:L6-3
Relational Algebra
• Basic operations:
– Selection ( ) Selects a subset of rows from relation.
– Projection ( ) Deletes unwanted columns from relation.
–
–
Cross-product (
Set-difference (
) Allows us to combine two relations.
) Tuples in reln. 1, but not in reln. 2.
–
Union ( ) Tuples in reln. 1 and in reln. 2.
• Additional operations:
– Intersection, join, division, renaming: Not essential, but (very!)
useful.
• Since each operation returns a relation, operations can be composed!
(Algebra is “closed”.)
Slide No:L6-4
Projection
sname rating
yuppy 9
• Deletes attributes that are not
in projection list.
lubber 8
• Schema of result contains
guppy 5
exactly the fields in the rusty 10
projection list, with the same
names that they had in the
sname,rating(S2)
(only) input relation.
• Projection operator has to
eliminate duplicates! (Why??) age
– Note: real systems typically 35.0
don’t do duplicate elimination
unless the user explicitly asks 55.5
for it. (Why not?)
age(S2)
Slide No:L6-5
Selection
sid sname rating age
• Selects rows that satisfy 28 yuppy 9 35.0
selection condition. 58 rusty 10 35.0
rating 8(S2)
• No duplicates in result!
(Why?)
• Schema of result identical to
schema of (only) input
relation. sname rating
• Result relation can be the
input for another relational
yuppy 9
algebra operation! (Operator rusty 10
composition.)
Slide No:L6-6
Union, Intersection, Set-Difference
• All of these operations take two sid sname rating age
input relations, which must be
union-compatible: 22 dustin 7 45.0
– Same number of fields.
31 lubber 8 55.5
– `Corresponding’ fields have
the same type.
58 rusty 10 35.0
• What is the schema of result? 44 guppy 5 35.0
28 yuppy 9 35.0
S1 S2
sid sname rating age sid sname rating age
22 dustin 7 45.0 31 lubber 8 55.5
58 rusty 10 35.0
S1 S2
S1 S2
Slide No:L6-7
Cross-Product
• Each row of S1 is paired with each row of R1.
• Result schema has one field per field of S1 and R1,
with field names `inherited’ if possible.
– Conflict: Both S1 and R1 have a field called sid.
Slide No:L6-8
Joined Relations**
• Join operations take two relations and return as a
result another relation.
• These additional operations are typically used as
subquery expressions in the from clause
• Join condition – defines which tuples in the two
relations match, and what attributes are present in
the result of the join.
• Join type – defines how tuples in each relation that
do not match any tuple in the other relation (based
on the join condition) are treated.
Slide No:L7-5
Joined Relations – Datasets for Examples
Relation borrower
• Relation loan
Slide No:L8-1
Joined Relations – Examples
• loan inner join borrower on
loan.loan_number = borrower.loan_number
Slide No:L8-2
Joined Relations – Examples
• loan natural inner join borrower
Find all customers who have either an account or a loan (but not both) at the bank.
select customer_name
from (depositor natural full outer join borrower )
where account_number is null or loan_number is null
Slide No:L8-3
Joined Relations – Examples
• Natural join can get into trouble if two relations have an
attribute with
same name that should not affect the join condition
– e.g. an attribute such as remarks may be present in
many tables
• Solution:
– loan full outer join borrower using (loan_number)
Slide No:L8-4
Division
– A/B =
• Let A have 2xfields,
| xx, and
y y;
A Bhave
y only
B field y:
Slide No:L6-13
Find names of sailors who’ve reserved boat
#103
• Solution 1: sname(( Reserves) Sailors)
bid 103
Solution 2: (Temp1, Re serves)
bid 103
Slide No:L6-14
Find names of sailors who’ve reserved a red boat
• Information about boat color only available in Boats; so need an
extra join:
Slide No:L6-15
Find sailors who’ve reserved a red or a green boat
• Can identify all red or green boats, then find sailors who’ve
reserved one of these boats:
(Tempboats, ( Boats))
color ' red ' color ' green '
sname(Tempboats Re serves Sailors)
Slide No:L6-16
Find sailors who’ve reserved a red and a green boat
• Previous approach won’t work! Must identify sailors who’ve
reserved red boats, sailors who’ve reserved green boats, then find
the intersection (note that sid is a key for Sailors):
Slide No:L6-17
Relational Calculus
Slide No:L7-1
Domain Relational Calculus
• Atomic formula:
– x1, x2,..., xn Rname, or X op Y, or X op constant
– op is one of , , , , ,
• Formula:
– an atomic formula, or
p, p q, p q
– , where p and q are formulas, or
–X ( p( X )) , where variable X is free in p(X), or
X ( p( X ))
– , where variable X is free in p(X)
• X
The use of quantifiers and X is said to bind X.
– A variable that is not bound is free.
Slide No:L7-3
Free and Bound Variables
x1, x2,..., xn | p x1, x2,..., xn
Slide No:L8-1
Find all sailors with a rating above 7
I, N,T, A | I, N,T, A Sailors T 7
Slide No:L8-2
Find sailors rated > 7 who have reserved boat #103
I, N,T, A | I, N,T, A Sailors T 7
Ir, Br, D Ir, Br, D Re serves Ir I Br 103
Slide No:L8-3
Find sailors rated > 7 who’ve reserved a red boat
I, N, T, A | I, N, T, A Sailors T 7
Slide No:L8-4
Find sailors who’ve reserved all boats
I, N,T, A | I, N, T, A Sailors
Ir, Br, D
Ir, Br, D Re serves I Ir Br B
• Find all sailors I such that for each 3-tuple B, BN,C either it is not
a tuple in Boats or there is a tuple in Reserves showing that sailor I has
reserved it.
Slide No:L8-5
Find sailors who’ve reserved all boats (again!)
I, N,T, A | I, N, T, A Sailors
B, BN, C Boats
Ir, Br, D Re serves I Ir Br B
....
C ' red ' Ir, Br, D Re serves I Ir Br B
.
Slide No:L8-6
Unsafe Queries, Expressive Power
Slide No:L8-7
Functional Dependencies (FDs)
Slide No:L2-4
Reasoning About FDs
Slide No:L2-8
Closure of a Set of FDs
• Reflexivity: If X Y, then X !Y.
• Augmentation: If X ! Y, then XZ ! YZ for any Z.
• Transitivity: If X ! Y and Y ! Z, then X ! Z.
• Armstrong's Axioms are sound in that they generate
only FDs in F+ when applied to a set F of FDs.
• They are complete in that repeated application of
these rules will generate all FDs in the closure F+.
• It is convenient to use some additional rules while
reasoning about F+:
• Union: If X ! Y and X ! Z, then X !YZ.
• Decomposition: If X ! YZ, then X !Y and X ! Z.
• These additional rules are not essential; their
soundness can be proved using Armstrong's Axioms.
Slide No:L2-9
Attribute Closure
Slide No. L4-6
Dependency Preserving Decompositions
(Contd.)
Slide No:L5-1
Constraints on a Relationship Set
Slide No:L5-2
Identifying Attributes of Entities
Slide No:L5-3
Identifying Entity Sets
Slide No:L5-4
Multivalued Dependencies
Slide No:L6-1
There are three points to note here:
• The relation schema CTB is in BCNF; thus we would not
consider decomposing it further if we looked only at the
FDs that hold over CTB.
• There is redundancy. The fact that Green can teach
Physics101 is recorded once per recommended text for the
course. Similarly, the fact that Optics is a text for
Physics101 is recorded once per potential teacher.
• The redundancy can be eliminated by decomposing CTB
into CT and CB.
• Let R be a relation schema and let X and Y be subsets of
the attributes of R. Intuitively,
• the multivalued dependency X !! Y is said to hold over R
if, in every legal
Slide No:L6-2
• The redundancy in this example is due to the
constraint that the texts for a course are
independent of the instructors, which cannot
be epressed in terms of FDs.
• This constraint is an example of a multivalued
dependency, or MVD. Ideally, we should model
this situation using two binary relationship
sets, Instructors with attributes CT and Text
with attributes CB.
• Because these are two essentially independent
relationships, modeling them with a single
ternary relationship set with attributes CTB is
inappropriate.
Slide No:L6-3
• Three of the additional rules involve only MVDs:
• MVD Complementation: If X →→Y, then X →→ R −
XY
• MVD Augmentation: If X →→ Y and W > Z, then
WX →→ YZ.
• MVD Transitivity: If X →→ Y and Y →→ Z, then
X →→ (Z − Y ).
• Fourth Normal Form
• R is said to be in fourth normal form (4NF) if for
every MVD X →→Y that holds over R, one of the
following statements is true:
• Y subset of X or XY = R, or
• X is a superkey.
Slide No:L6-4
Join Dependencies
Slide No:L7-1
Fifth Normal Form
Slide No:L7-2
Inclusion Dependencies
Slide No:L7-3
Transaction Concept
• A transaction is a unit of program execution that accesses
and possibly updates various data items.
• E.g. transaction to transfer $50 from account A to account B:
1. read(A)
2. A := A – 50
3. write(A)
4. read(B)
5. B := B + 50
6. write(B)
• Two main issues to deal with:
– Failures of various kinds, such as hardware failures and
system crashes
– Concurrent execution of multiple transactions
Slide No.L1-1
Example of Fund Transfer
• Transaction to transfer $50 from account A to account B:
1. read(A)
2. A := A – 50
3. write(A)
4. read(B)
5. B := B + 50
6. write(B)
• Atomicity requirement
– if the transaction fails after step 3 and before step 6, money will be
“lost” leading to an inconsistent database state
• Failure could be due to software or hardware
– the system should ensure that updates of a partially executed
transaction are not reflected in the database
• Durability requirement — once the user has been notified that the
transaction has completed (i.e., the transfer of the $50 has taken
place), the updates to the database by the transaction must persist
even if there are software or hardware failures.
Slide No.L1-2
Example of Fund Transfer (Cont.)
• Transaction to transfer $50 from account A to account B:
1. read(A)
2. A := A – 50
3. write(A)
4. read(B)
5. B := B + 50
6. write(B)
• Consistency requirement in above example:
– the sum of A and B is unchanged by the execution of the
transaction
• In general, consistency requirements include
• Explicitly specified integrity constraints such as primary
keys and foreign keys
• Implicit integrity constraints
– e.g. sum of balances of all accounts, minus sum of
loan amounts must equal value of cash-in-hand
– A transaction must see a consistent database.
– During transaction execution the database may be temporarily
inconsistent.
– When the transaction completes successfully the database must be
consistent
• Erroneous transaction logic can lead to inconsistency
Slide No.L1-3
Example of Fund Transfer (Cont.)
• Isolation requirement — if between steps 3 and 6, another
transaction T2 is allowed to access the partially updated database, it
will see an inconsistent database (the sum A + B will be less than it
should be).
T1 T2
1. read(A)
2. A := A – 50
3. write(A)
read(A), read(B), print(A+B)
4. read(B)
5. B := B + 50
6. write(B
• Isolation can be ensured trivially by running transactions serially
– that is, one after the other.
• However, executing multiple transactions concurrently has significant
benefits, as we will see later.
Slide No.L1-4
ACID Properties
A transaction is a unit of program execution that accesses and possibly
updates various data items.To preserve the integrity of data the database
system must ensure:
• Atomicity. Either all operations of the transaction are properly
reflected in the database or none are.
• Consistency. Execution of a transaction in isolation preserves the
consistency of the database.
• Isolation. Although multiple transactions may execute
concurrently, each transaction must be unaware of other
concurrently executing transactions. Intermediate transaction
results must be hidden from other concurrently executed
transactions.
– That is, for every pair of transactions Ti and Tj, it appears to Ti
that either Tj, finished execution before Ti started, or Tj started
execution after Ti finished.
• Durability. After a transaction completes successfully, the changes
it has made to the database persist, even if there are system failures.
Slide No.L1-5
Transaction State
• Active – the initial state; the transaction stays in this state
while it is executing
• Partially committed – after the final statement has been
executed.
• Failed -- after the discovery that normal execution can no
longer proceed.
• Aborted – after the transaction has been rolled back and the
database restored to its state prior to the start of the
transaction. Two options after it has been aborted:
– restart the transaction
• can be done only if no internal logical error
– kill the transaction
• Committed – after successful completion.
Slide No.L1-6
Transaction State (Cont.)
Slide No.L1-7
Implementation of Atomicity and Durability
• The recovery-management component of a database system
implements the support for atomicity and durability.
• E.g. the shadow-database scheme:
– all updates are made on a shadow copy of the database
• db_pointer is made to point to the updated shadow
copy after
– the transaction reaches partial commit and
– all updated pages have been flushed to disk.
Slide No.L2-1
Implementation of Atomicity and Durability (Cont.)
• db_pointer always points to the current consistent copy of the
database.
– In case transaction fails, old consistent copy pointed to by
db_pointer can be used, and the shadow copy can be deleted.
• The shadow-database scheme:
– Assumes that only one transaction is active at a time.
– Assumes disks do not fail
– Useful for text editors, but
• extremely inefficient for large databases (why?)
– Variant called shadow paging reduces copying of
data, but is still not practical for large databases
– Does not handle concurrent transactions
• Will study better schemes in Chapter 17.
Slide No.L2-2
Concurrent Executions
• Multiple transactions are allowed to run concurrently in the
system. Advantages are:
– increased processor and disk utilization, leading to
better transaction throughput
• E.g. one transaction can be using the CPU while
another is reading from or writing to the disk
– reduced average response time for transactions: short
transactions need not wait behind long ones.
• Concurrency control schemes – mechanisms to achieve
isolation
– that is, to control the interaction among the concurrent
transactions in order to prevent them from destroying the
consistency of the database
• Will study in Chapter 16, after studying notion
of correctness of concurrent executions.
Slide No.L2-3
Schedules
• Schedule – a sequences of instructions that specify the
chronological order in which instructions of concurrent
transactions are executed
– a schedule for a set of transactions must consist of all
instructions of those transactions
– must preserve the order in which the instructions
appear in each individual transaction.
• A transaction that successfully completes its execution
will have a commit instructions as the last statement
– by default transaction assumed to execute commit
instruction as its last step
• A transaction that fails to successfully complete its
execution will have an abort instruction as the last
statement
Slide No.L2-4
Schedule 1
• Let T1 transfer $50 from A to B, and T2 transfer 10% of the balance
from A to B.
• A serial schedule in which T1 is followed by T2 :
Slide No.L2-5
Schedule 2
• A serial schedule where T2 is followed by T1
Slide No.L2-6
Schedule 3
• Let T1 and T2 be the transactions defined previously. The following
schedule is not a serial schedule, but it is equivalent to Schedule 1.
Slide No.L2-8
Serializability
• Basic Assumption – Each transaction preserves database
consistency.
• Thus serial execution of a set of transactions preserves database
consistency.
• A (possibly concurrent) schedule is serializable if it is equivalent to a
serial schedule. Different forms of schedule equivalence give rise to
the notions of:
1. conflict serializability
2. view serializability
• Simplified view of transactions
– We ignore operations other than read and write instructions
– We assume that transactions may perform arbitrary computations
on data in local buffers in between reads and writes.
– Our simplified schedules consist of only read and write
instructions.
Slide No.L3-1
Conflicting Instructions
• Instructions li and lj of transactions Ti and Tj respectively,
conflict if and only if there exists some item Q accessed
by both li and lj, and at least one of these instructions
wrote Q.
1. li = read(Q), lj = read(Q). li and lj don’t conflict.
2. li = read(Q), lj = write(Q). They conflict.
3. li = write(Q), lj = read(Q). They conflict
4. li = write(Q), lj = write(Q). They conflict
• Intuitively, a conflict between li and lj forces a (logical)
temporal order between them.
– If li and lj are consecutive in a schedule and they do
not conflict, their results would remain the same even
if they had been interchanged in the schedule.
Slide No.L3-2
Conflict Serializability
Slide No.L3-3
Conflict Serializability (Cont.)
• Schedule 3 can be transformed into Schedule 6, a serial
schedule where T2 follows T1, by series of swaps of non-
conflicting instructions.
– Therefore Schedule 3 is conflict serializable.
Schedule 3 Schedule 6
Slide No.L3-4
Conflict Serializability (Cont.)
• Example of a schedule that is not conflict serializable:
Slide No.L3-5
View Serializability
• Let S and S´ be two schedules with the same set of
transactions. S and S´ are view equivalent if the following
three conditions are met, for each data item Q,
1. If in schedule S, transaction Ti reads the initial value of Q,
then in schedule S’ also transaction Ti must read the
initial value of Q.
2. If in schedule S transaction Ti executes read(Q), and that
value was produced by transaction Tj (if any), then in
schedule S’ also transaction Ti must read the value of Q
that was produced by the same write(Q) operation of
transaction Tj .
3. The transaction (if any) that performs the final write(Q)
operation in schedule S must also perform the final
write(Q) operation in schedule S’.
As can be seen, view equivalence is also based purely on reads
and writes alone.
Slide No.L3-6
View Serializability (Cont.)
• A schedule S is view serializable if it is view equivalent to
a serial schedule.
• Every conflict serializable schedule is also view
serializable.
• Below is a schedule which is view-serializable but not
conflict serializable.
Slide No.L3-7
Other Notions of Serializability
• The schedule below produces same outcome as the serial
schedule < T1, T5 >, yet is not conflict equivalent or view
equivalent to it.
Slide No.L3-8
Recoverable Schedules
Need to address the effect of transaction failures on concurrently
running transactions.
• Recoverable schedule — if a transaction Tj reads a data
item previously written by a transaction Ti , then the
commit operation of Ti appears before the commit
operation of Tj.
• The following schedule (Schedule 11) is not recoverable if
T9 commits immediately after the read
Slide No.L4-1
Cascading Rollbacks
• Cascading rollback – a single transaction failure leads to
a series of transaction rollbacks. Consider the following
schedule where none of the transactions has yet
committed (so the schedule is recoverable)
Slide No.L4-2
Cascadeless Schedules
Slide No.L4-3
Concurrency Control
• A database must provide a mechanism that will ensure
that all possible schedules are
– either conflict or view serializable, and
– are recoverable and preferably cascadeless
• A policy in which only one transaction can execute at a
time generates serial schedules, but provides a poor degree
of concurrency
– Are serial schedules recoverable/cascadeless?
• Testing a schedule for serializability after it has executed is
a little too late!
• Goal – to develop concurrency control protocols that will
assure serializability.
Slide No.L4-4
Concurrency Control vs. Serializability Tests
• Concurrency-control protocols allow concurrent schedules,
but ensure that the schedules are conflict/view serializable,
and are recoverable and cascadeless .
• Concurrency control protocols generally do not examine the
precedence graph as it is being created
– Instead a protocol imposes a discipline that avoids
nonseralizable schedules.
– We study such protocols in Chapter 16.
• Different concurrency control protocols provide different
tradeoffs between the amount of concurrency they allow and
the amount of overhead that they incur.
• Tests for serializability help us understand why a
concurrency control protocol is correct.
Slide No.L4-5
Weak Levels of Consistency
Slide No.L4-6
Levels of Consistency in SQL-92
• Serializable — default
• Repeatable read — only committed records to be read,
repeated reads of same record must return same value.
However, a transaction may not be serializable – it may find
some records inserted by a transaction but not find others.
• Read committed — only committed records can be read,
but successive reads of record may return different (but
committed) values.
• Read uncommitted — even uncommitted records may be
read.
Slide No.L4-8
Implementation of Isolation
• Schedules must be conflict or view serializable, and
recoverable, for the sake of database consistency, and
preferably cascadeless.
• A policy in which only one transaction can execute at a time
generates serial schedules, but provides a poor degree of
concurrency.
• Concurrency-control schemes tradeoff between the amount of
concurrency they allow and the amount of overhead that they
incur.
• Some schemes allow only conflict-serializable schedules to be
generated, while others allow view-serializable schedules that
are not conflict-serializable.
Slide No.L5-1
Figure 15.6
Slide No.L5-2
Testing for Serializability
• Consider some schedule of a set of transactions T1, T2, ...,
Tn
• Precedence graph — a direct graph where the vertices are
the transactions (names).
• We draw an arc from Ti to Tj if the two transaction conflict,
and Ti accessed the data item on which the conflict arose
earlier.
• We may label the arc by the item that was accessed.
• Example 1
x
Slide No.L5-3
Example Schedule (Schedule A) + Precedence Graph
T1 T2 T3 T4 T5
read(X)
read(Y)
read(Z)
read(V)
read(W)
read(W) T1 T2
read(Y)
write(Y)
write(Z)
read(U)
read(Y)
write(Y)
read(Z)
T3 T4
write(Z)
read(U)
write(U)
T5
Slide No.L5-4
Test for Conflict Serializability
• A schedule is conflict serializable if and only
if its precedence graph is acyclic.
• Cycle-detection algorithms exist which take
order n2 time, where n is the number of
vertices in the graph.
– (Better algorithms take order n + e where
e is the number of edges.)
• If precedence graph is acyclic, the
serializability order can be obtained by a
topological sorting of the graph.
– This is a linear order consistent with the
partial order of the graph.
– For example, a serializability order for
Schedule A would be
T5 T1 T3 T2 T4
• Are there others?
Slide No.L5-5
Test for View Serializability
• The precedence graph test for conflict serializability
cannot be used directly to test for view serializability.
– Extension to test for view serializability has cost
exponential in the size of the precedence graph.
• The problem of checking if a schedule is view serializable
falls in the class of NP-complete problems.
– Thus existence of an efficient algorithm is extremely
unlikely.
• However practical algorithms that just check some
sufficient conditions for view serializability can still be
used.
Slide No.L5-6
Lock-Based Protocols
• A lock is a mechanism to control concurrent access to a
data item
• Data items can be locked in two modes :
1. exclusive (X) mode. Data item can be both read as well as
Slide No.L6-1
Lock-Based Protocols (Cont.)
• Lock-compatibility matrix
Slide No.L6-2
Lock-Based Protocols (Cont.)
• Example of a transaction performing locking:
T2: lock-S(A);
read (A);
unlock(A);
lock-S(B);
read (B);
unlock(B);
display(A+B)
• Locking as above is not sufficient to guarantee serializability
— if A and B get updated in-between the read of A and B, the
displayed sum would be wrong.
• A locking protocol is a set of rules followed by all
transactions while requesting and releasing locks. Locking
protocols restrict the set of possible schedules.
Slide No.L6-3
Pitfalls of Lock-Based Protocols
• Consider the partial schedule
Slide No.L6-4
Pitfalls of Lock-Based Protocols (Cont.)
• The potential for deadlock exists in most locking
protocols. Deadlocks are a necessary evil.
• Starvation is also possible if concurrency control
manager is badly designed. For example:
– A transaction may be waiting for an X-lock on an
item, while a sequence of other transactions request
and are granted an S-lock on the same item.
– The same transaction is repeatedly rolled back due to
deadlocks.
• Concurrency control manager can be designed to prevent
starvation.
Slide No.L6-5
The Two-Phase Locking Protocol
• This is a protocol which ensures conflict-serializable
schedules.
• Phase 1: Growing Phase
– transaction may obtain locks
– transaction may not release locks
• Phase 2: Shrinking Phase
– transaction may release locks
– transaction may not obtain locks
• The protocol assures serializability. It can be proved that the
transactions can be serialized in the order of their lock
points (i.e. the point where a transaction acquired its final
lock).
Slide No.L7-1
The Two-Phase Locking Protocol (Cont.)
• Two-phase locking does not ensure freedom from deadlocks
• Cascading roll-back is possible under two-phase locking. To
avoid this, follow a modified protocol called strict two-phase
locking. Here a transaction must hold all its exclusive locks
till it commits/aborts.
• Rigorous two-phase locking is even stricter: here all locks
are held till commit/abort. In this protocol transactions can
be serialized in the order in which they commit.
Slide No.L7-2
The Two-Phase Locking Protocol (Cont.)
Slide No.L7-3
Lock Conversions
• Two-phase locking with lock conversions:
– First Phase:
– can acquire a lock-S on item
– can acquire a lock-X on item
– can convert a lock-S to a lock-X (upgrade)
– Second Phase:
– can release a lock-S
– can release a lock-X
– can convert a lock-X to a lock-S (downgrade)
• This protocol assures serializability. But still relies on the
programmer to insert the various locking instructions.
Slide No.L7-4
Automatic Acquisition of Locks
• A transaction Ti issues the standard read/write
instruction, without explicit locking calls.
• The operation read(D) is processed as:
if Ti has a lock on D
then
read(D)
else begin
if necessary wait until no other
transaction has a lock-X on D
grant Ti a lock-S on D;
read(D)
end
Slide No.L7-5
Automatic Acquisition of Locks (Cont.)
• write(D) is processed as:
if Ti has a lock-X on D
then
write(D)
else begin
if necessary wait until no other trans. has any lock
on D,
if Ti has a lock-S on D
then
upgrade lock on D to lock-X
else
grant Ti a lock-X on D
write(D)
end;
• All locks are released after commit or abort
Slide No.L7-6
Implementation of Locking
• A lock manager can be implemented as a separate
process to which transactions send lock and unlock
requests
• The lock manager replies to a lock request by sending a
lock grant messages (or a message asking the
transaction to roll back, in case of a deadlock)
• The requesting transaction waits until its request is
answered
• The lock manager maintains a data-structure called a
lock table to record granted locks and pending requests
• The lock table is usually implemented as an in-memory
hash table indexed on the name of the data item being
locked
Slide No.L7-7
Lock Table
• Black rectangles indicate granted
locks, white ones indicate waiting
requests
• Lock table also records the type of
lock granted or requested
• New request is added to the end of
the queue of requests for the data
item, and granted if it is compatible
with all earlier locks
• Unlock requests result in the
request being deleted, and later
requests are checked to see if they
can now be granted
Granted • If transaction aborts, all waiting or
granted requests of the transaction
Waiting are deleted
– lock manager may keep a list of
locks held by each transaction,
to implement this efficiently
Slide No.L7-8
Graph-Based Protocols
Slide No.L7-9
Tree Protocol
Slide No.L7-10
Timestamp-Based Protocols
• Each transaction is issued a timestamp when it enters the system.
If an old transaction Ti has time-stamp TS(Ti), a new transaction Tj
is assigned time-stamp TS(Tj) such that TS(Ti) <TS(Tj).
• The protocol manages concurrent execution such that the time-
stamps determine the serializability order.
• In order to assure such behavior, the protocol maintains for each
data Q two timestamp values:
– W-timestamp(Q) is the largest time-stamp of any transaction
that executed write(Q) successfully.
– R-timestamp(Q) is the largest time-stamp of any transaction
that executed read(Q) successfully.
T1 T2 T3 T4 T5
read(X)
read(Y)
read(Y)
write(Y)
write(Z)
read(Z)
read(X)
abort
read(X)
write(Z)
abort
write(Y)
write(Z)
transaction transaction
with smaller with larger
timestamp timestamp
T14 T15
read(B)
read(B)
B:= B-50
read(A)
A:= A+50
read(A)
(validate)
display (A+B)
(validate)
write (B)
write (A)
IS IX S S IX X
IS
IX
S
S IX
X
Slide No.L1-1
Recovery and Atomicity (Cont.)
• To ensure atomicity despite failures, we first output
information describing the modifications to stable
storage without modifying the database itself.
• We study two approaches:
– log-based recovery, and
– shadow-paging
• We assume (initially) that transactions run serially, that
is, one after the other.
Slide No.L1-2
Recovery Algorithms
• Recovery algorithms are techniques to ensure database
consistency and transaction atomicity and durability
despite failures
– Focus of this chapter
• Recovery algorithms have two parts
1. Actions taken during normal transaction processing
to ensure enough information exists to recover from
failures
2. Actions taken after a failure to recover the database
contents to a state that ensures atomicity,
consistency and durability
Slide No.L1-3
Log-Based Recovery
• A log is kept on stable storage.
– The log is a sequence of log records, and maintains a record of
update activities on the database.
• When transaction Ti starts, it registers itself by writing a
<Ti start>log record
• Before Ti executes write(X), a log record <Ti, X, V1, V2> is written,
where V1 is the value of X before the write, and V2 is the value to be
written to X.
– Log record notes that Ti has performed a write on data item Xj Xj
had value V1 before the write, and will have value V2 after the
write.
• When Ti finishes it last statement, the log record <Ti commit> is
written.
• We assume for now that log records are written directly to stable
storage (that is, they are not buffered)
• Two approaches using logs
– Deferred database modification
– Immediate database modification
Slide No.L1-4
Deferred Database Modification
• The deferred database modification scheme records all
modifications to the log, but defers all the writes to after
partial commit.
• Assume that transactions execute serially
• Transaction starts by writing <Ti start> record to log.
• A write(X) operation results in a log record <Ti, X, V>
being written, where V is the new value for X
– Note: old value is not needed for this scheme
• The write is not performed on X at this time, but is
deferred.
• When Ti partially commits, <Ti commit> is written to the
log
• Finally, the log records are read and used to actually
execute the previously deferred writes.
Slide No.L1-5
Deferred Database Modification (Cont.)
• During recovery after a crash, a transaction needs to be redone if
and only if both <Ti start> and<Ti commit> are there in the log.
• Redoing a transaction Ti ( redoTi) sets the value of all data items
updated by the transaction to the new values.
• Crashes can occur while
– the transaction is executing the original updates, or
– while recovery action is being taken
• example transactions T0 and T1 (T0 executes before T1):T0: read (A)
T1 : read (C)
A: - A - 50 C:-C- 100
Write (A) write (C)
read (B)
B:- B + 50
write (B)
Slide No.L1-6
Deferred Database Modification (Cont.)
Slide No.L2-1
Immediate Database Modification
Slide No.L2-2
Immediate Database Modification (Cont.)
• Both operations must be idempotent
– That is, even if the operation is executed multiple
times the effect is the same as if it is executed once
• Needed since operations may get re-executed
during recovery
• When recovering after failure:
– Transaction Ti needs to be undone if the log contains
the record
<Ti start>, but does not contain the record <Ti
commit>.
– Transaction Ti needs to be redone if the log contains
both the record <Ti start> and the record <Ti
commit>.
• Undo operations are performed first, then redo
operations.
Slide No.L2-3
Immediate Database Modification Example
Log Write Output
<T0 start>
<T0, A, 1000, 950>
To, B, 2000, 2050
A = 950
B = 2050
<T0 commit>
<T1 start>
<T1, C, 700, 600>
x1
C = 600
BB, BC
<T1 commit>
BA
• Note: BX denotes block containing X.
Slide No.L2-4
Immediate DB Modification Recovery Example
Below we show the log as it appears at three instances of
time.
Slide No.L2-5
Checkpoints
• Problems in recovery procedure as discussed earlier :
1. searching the entire log is time-consuming
2. we might unnecessarily redo transactions which have
already
3. output their updates to the database.
• Streamline recovery procedure by periodically performing
checkpointing
1. Output all log records currently residing in main
memory onto stable storage.
2. Output all modified buffer blocks to the disk.
3. Write a log record < checkpoint> onto stable storage.
Slide No.L2-6
Checkpoints (Cont.)
• During recovery we need to consider only the most recent
transaction Ti that started before the checkpoint, and transactions
that started after Ti.
1. Scan backwards from end of log to find the most recent
<checkpoint> record
2. Continue scanning backwards till a record <Ti start> is found.
3. Need only consider the part of log following above start record.
Earlier part of log can be ignored during recovery, and can be
erased whenever desired.
4. For all transactions (starting from Ti or later) with no <Ti
commit>, execute undo(Ti). (Done only in case of immediate
modification.)
5. Scanning forward in the log, for all transactions starting from Ti
or later with a <Ti commit>, execute redo(Ti).
Slide No.L2-7
Example of Checkpoints
Tc Tf
T1
T2
T3
T4
Slide No.L3-1
Recovery With Concurrent Transactions
• The checkpointing technique and actions taken
on recovery have to be changed
– since several transactions may be active when
a checkpoint is performed.
Slide No.L3-2
Recovery With Concurrent Transactions (Cont.)
Slide No.L3-3
Recovery With Concurrent Transactions (Cont.)
• At this point undo-list consists of incomplete transactions which
must be undone, and redo-list consists of finished transactions
that must be redone.
• Recovery now continues as follows:
1. Scan log backwards from most recent record, stopping when
<Ti start> records have been encountered for every Ti in
undo-list.
During the scan, perform undo for each log record that belongs to a
transaction in undo-list.
2. Locate the most recent <checkpoint L> record.
3. Scan log forwards from the <checkpoint L> record till the
end of the log.
During the scan, perform redo for each log record that belongs to a
transaction on redo-list
Slide No.L3-4
Example of Recovery
• Go over the steps of the recovery algorithm on the following
log:
<T0 start>
<T0, A, 0, 10>
<T0 commit>
<T1 start> /* Scan at step 1 comes up to here */
<T1, B, 0, 10>
<T2 start>
<T2, C, 0, 10>
<T2, C, 10, 20>
<checkpoint {T1, T2}>
<T3 start>
<T3, A, 10, 20>
<T3, D, 0, 10>
<T3 commit>
Slide No.L3-5
Log Record Buffering
Slide No.L4-1
Log Record Buffering (Cont.)
• The rules below must be followed if log records are
buffered:
– Log records are output to stable storage in the order in
which they are created.
– Transaction Ti enters the commit state only when the
log record
<Ti commit> has been output to stable storage.
– Before a block of data in main memory is output to the
database, all log records pertaining to data in that
block must have been output to stable storage.
• This rule is called the write-ahead logging or WAL rule
– Strictly speaking WAL only requires undo information to be
output
Slide No.L4-2
Database Buffering
• Database maintains an in-memory buffer of data blocks
– When a new block is needed, if buffer is full an existing block
needs to be removed from buffer
– If the block chosen for removal has been updated, it must be
output to disk
• If a block with uncommitted updates is output to disk, log records
with undo information for the updates are output to the log on
stable storage first
– (Write ahead logging)
• No updates should be in progress on a block when it is output to
disk. Can be ensured as follows.
– Before writing a data item, transaction acquires exclusive lock
on block containing the data item
– Lock can be released once the write is completed.
• Such locks held for short duration are called latches.
– Before a block is output to disk, the system acquires an
exclusive latch on the block
• Ensures no update can be in progress on the block
Slide No.L4-3
Buffer Management (Cont.)
• Database buffer can be implemented either
– in an area of real main-memory reserved for the
database, or
– in virtual memory
• Implementing buffer in reserved main-memory has
drawbacks:
– Memory is partitioned before-hand between database
buffer and applications, limiting flexibility.
– Needs may change, and although operating system
knows best how memory should be divided up at any
time, it cannot change the partitioning of memory.
• Database buffers are generally implemented in virtual
memory in spite of some drawbacks:
– When operating system needs to evict a page that has
been modified, the page is written to swap space on disk.
Slide No.L4-4
Buffer Management (Cont.)
– When database decides to write buffer page to disk,
buffer page may be in swap space, and may have to
be read from swap space on disk and output to the
database on disk, resulting in extra I/O!
• Known as dual paging problem.
– Ideally when OS needs to evict a page from the buffer,
it should pass control to database, which in turn
should
1. Output the page to database instead of to swap
space (making sure to output log records first), if it
is modified
2. Release the page from the buffer, for the OS to use
Dual paging can thus be avoided, but common
operating systems do not support such
functionality.
Slide No.L4-5
Failure with Loss of Nonvolatile Storage
Slide No.L5-1
Recovering from Failure of Non-Volatile Storage
Slide No.L5-2
Advanced Recovery: Key Features
• Support for high-concurrency locking techniques, such
as those used for B+-tree concurrency control, which
release locks early
– Supports “logical undo”
• Recovery based on “repeating history”, whereby recovery
executes exactly the same actions as normal processing
– including redo of log records of incomplete
transactions, followed by subsequent undo
– Key benefits
• supports logical undo
• easier to understand/show correctness
Slide No.L5-3
Advanced Recovery: Logical Undo Logging
Slide No.L5-4
Advanced Recovery: Physical Redo
• Redo information is logged physically (that is, new
value for each write) even for operations with logical
undo
– Logical redo is very complicated since database state
on disk may not be “operation consistent” when
recovery starts
– Physical redo logging does not conflict with early lock
release
Slide No.L5-5
Advanced Recovery: Operation Logging
• Operation logging is done as follows:
1. When operation starts, log <Ti, Oj, operation-begin>.
Here Oj is a unique identifier of the operation
instance.
2. While operation is executing, normal log records with
physical redo and physical undo information are
logged.
3. When operation completes, <Ti, Oj, operation-end,
U> is logged, where U contains information needed
to perform a logical undo information.
Example: insert of (key, record-id) pair (K5, RID7) into
index
<T1,I9
O1, operation-begin>
…. Physical redo of steps in insert
<T1, X, 10, K5>
<T1, Y, 45, RID7>
<T1, O1, operation-end, (delete I9, K5, RID7)>
Slide No.L5-6
Advanced Recovery: Operation Logging (Cont.)
• If crash/rollback occurs before operation completes:
– the operation-end log record is not found, and
– the physical undo information is used to undo
operation.
• If crash/rollback occurs after the operation completes:
– the operation-end log record is found, and in this
case
– logical undo is performed using U; the physical
undo information for the operation is ignored.
• Redo of operation (after crash) still uses physical redo
information.
Slide No.L5-7
Advanced Recovery: Txn Rollback
Rollback of transaction Ti is done as follows:
• Scan the log backwards
1. If a log record <Ti, X, V1, V2> is found, perform the undo
and log a special redo-only log record <Ti, X, V1>.
2. If a <Ti, Oj, operation-end, U> record is found
• Rollback the operation logically using the undo
information U.
– Updates performed during roll back are logged just
like during normal operation execution.
– At the end of the operation rollback, instead of
logging an operation-end record, generate a record
<Ti, Oj, operation-abort>.
• Skip all preceding log records for Ti until the record
<Ti, Oj operation-begin> is found
Slide No.L5-8
Advanced Recovery: Txn Rollback (Cont.)
Slide No.L5-9
Advanced Recovery: Txn Rollback Example
Slide No.L5-10
Advanced Recovery: Crash Recovery
The following actions are taken when recovering from system crash
1. (Redo phase): Scan log forward from last < checkpoint L> record till
end of log
1. Repeat history by physically redoing all updates of all
transactions,
2. Create an undo-list during the scan as follows
• undo-list is set to L initially
• Whenever <Ti start> is found Ti is added to undo-list
• Whenever <Ti commit> or <Ti abort> is found, Ti is deleted
from undo-list
This brings database to state as of crash, with committed as well as
uncommitted transactions having been redone.
Now undo-list contains transactions that are incomplete, that is,
have neither committed nor been fully rolled back.
Slide No.L6-1
Advanced Recovery: Crash Recovery (Cont.)
Slide No.L6-2
Advanced Recovery: Checkpointing
• Checkpointing is done as follows:
1. Output all log records in memory to stable storage
2. Output to disk all modified buffer blocks
3. Output to log on stable storage a < checkpoint L>
record.
Transactions are not allowed to perform any actions
while checkpointing is in progress.
• Fuzzy checkpointing allows transactions to progress
while the most time consuming parts of checkpointing
are in progress
– Performed as described on next slide
Slide No.L6-3
Advanced Recovery: Fuzzy Checkpointing
Slide No.L6-5
Data on External Storage
• Disks: Can retrieve random page at fixed cost
– But reading several consecutive pages is much cheaper than
reading them in random order
• Tapes: Can only read pages in sequence
– Cheaper than disks; used for archival storage
• File organization: Method of arranging a file of records on external
storage.
– Record id (rid) is sufficient to physically locate record
– Indexes are data structures that allow us to find the record ids of
records with given values in index search key fields
• Architecture: Buffer manager stages pages from external storage to
main memory buffer pool. File and index layers make calls to the
buffer manager.
Slide No:L1-1
Alternative File Organizations
Many alternatives exist, each ideal for some situations, and not
so good in others:
– Heap (random order) files: Suitable when typical access
is a file scan retrieving all records.
– Sorted Files: Best if records must be retrieved in some
order, or only a `range’ of records is needed.
– Indexes: Data structures to organize records via trees or
hashing.
• Like sorted files, they speed up searches for a subset
of records, based on values in certain (“search key”)
fields
• Updates are much faster than in sorted files.
Slide No:L1-2
Index Classification
• Primary vs. secondary: If search key contains primary key, then
called primary index.
– Unique index: Search key contains a candidate key.
Slide No:L1-3
Clustered vs. Unclustered Index
• Suppose that Alternative (2) is used for data entries, and that the data
records are stored in a Heap file.
– To build clustered index, first sort the Heap file (with some free
space on each page for future inserts).
– Overflow pages may be needed for inserts. (Thus, order of data
recs is `close to’, but not identical to, the sort order.)
Index entries
direct search for
data entries
CLUSTERED UNCLUSTERED
Slide No:L1-4
Indexes
• An index on a file speeds up selections on the search
key fields for the index.
– Any subset of the fields of a relation can be the
search key for an index on the relation.
– Search key is not the same as key (minimal set of
fields that uniquely identify a record in a relation).
• An index contains a collection of data entries, and
supports efficient retrieval of all data entries k* with
a given key value k.
– Given data entry k*, we can find record with key
k in at most one disk I/O. (Details soon …)
Slide No:L2-1
B+ Tree Indexes
Non-leaf
Pages
Leaf
Pages
(Sorted by search key)
Leaf pages contain data entries, and are chained (prev & next)
Non-leaf pages have index entries; only used to direct searches:
index entry
P0 K 1 P1 K 2 P 2 K m Pm
Slide No:L2-2
Example B+ Tree
5 13 27 30
2* 3* 5* 7* 8* 14* 16* 22* 24* 27* 29* 33* 34* 38* 39*
Slide No:L2-4
Alternatives for Data Entry k* in Index
• In a data entry k* we can store:
– Data record with key value k, or
– <k, rid of data record with search key value k>, or
– <k, list of rids of data records with search key k>
• Choice of alternative for data entries is orthogonal to the
indexing technique used to locate data entries with a
given key value k.
– Examples of indexing techniques: B+ trees, hash-
based structures
– Typically, index contains auxiliary information that
directs searches to the desired data entries
Slide No:L2-5
Alternatives for Data Entries (Contd.)
• Alternative 1:
– If this is used, index structure is a file organization
for data records (instead of a Heap file or sorted
file).
– At most one index on a given collection of data
records can use Alternative 1. (Otherwise, data
records are duplicated, leading to redundant storage
and potential inconsistency.)
– If data records are very large, # of pages containing
data entries is high. Implies size of auxiliary
information in the index is also large, typically.
Slide No:L2-6
Alternatives for Data Entries (Contd.)
• Alternatives 2 and 3:
– Data entries typically much smaller than data
records. So, better than Alternative 1 with large
data records, especially if search keys are small.
(Portion of index structure used to direct search,
which depends on size of data entries, is much
smaller than with Alternative 1.)
– Alternative 3 more compact than Alternative 2,
but leads to variable sized data entries even if
search keys are of fixed length.
Slide No:L2-7
Cost Model for Our Analysis
We ignore CPU costs, for simplicity:
– B: The number of data pages
– R: Number of records per page
– D: (Average) time to read or write disk
page
– Measuring number of page I/O’s ignores
gains of pre-fetching a sequence of pages;
thus, even I/O cost is only approximated.
– Average-case analysis; based on several
simplistic assumptions.
Slide No:L3-1
Comparing File Organizations
• Heap files (random order; insert at eof)
• Sorted files, sorted on <age, sal>
• Clustered B+ tree file, Alternative (1), search key
<age, sal>
• Heap file with unclustered B + tree index on search
key <age, sal>
• Heap file with unclustered hash index on search key
<age, sal>
Slide No:L3-2
Operations to Compare
• Scan: Fetch all records from disk
• Equality search
• Range selection
• Insert a record
• Delete a record
Slide No:L3-3
Assumptions in Our Analysis
• Heap Files:
– Equality selection on key; exactly one match.
• Sorted Files:
– Files compacted after deletions.
• Indexes:
– Alt (2), (3): data entry size = 10% size of record
– Hash: No overflow buckets.
• 80% page occupancy => File size = 1.25 data size
– Tree: 67% occupancy (this is typical).
• Implies file size = 1.5 data size
Slide No:L3-4
Assumptions (contd.)
• Scans:
– Leaf levels of a tree-index are chained.
– Index data-entries plus actual file
scanned for unclustered indexes.
• Range searches:
– We use tree indexes to restrict the set
of data records fetched, but ignore hash
indexes.
Slide No:L3-5
Cost of Operations
Slide No:L4-1
Understanding the Workload
• For each query in the workload:
– Which relations does it access?
– Which attributes are retrieved?
– Which attributes are involved in selection/join
conditions? How selective are these conditions likely
to be?
• For each update in the workload:
– Which attributes are involved in selection/join
conditions? How selective are these conditions likely
to be?
– The type of update (INSERT/DELETE/UPDATE), and the
attributes that are affected.
Slide No:L4-2
Choice of Indexes
Slide No:L5-1
Choice of Indexes (Contd.)
• One approach: Consider the most important queries in turn. Consider
the best plan using the current indexes, and see if a better plan is
possible with an additional index. If so, create it.
– Obviously, this implies that we must understand
how a DBMS evaluates queries and creates query
evaluation plans!
– For now, we discuss simple 1-table queries.
• Before creating an index, must also consider the impact on updates in
the workload!
– Trade-off: Indexes can make queries go faster,
updates slower. Require disk space, too.
Slide No:L5-2
Index Selection Guidelines
• Attributes in WHERE clause are candidates for index keys.
– Exact match condition suggests hash index.
– Range query suggests tree index.
• Clustering is especially useful for range
queries; can also help on equality queries if
there are many duplicates.
• Multi-attribute search keys should be considered when a WHERE
clause contains several conditions.
– Order of attributes is important for range queries.
– Such indexes can sometimes enable index-only
strategies for important queries.
• For index-only strategies, clustering is not
important!
Slide No:L5-3
Examples of Clustered Indexes
• B+ tree index on E.age can be used to get SELECT E.dno
qualifying tuples. FROM Emp E
– How selective is the condition? WHERE E.age>40
– Is the index clustered?
SELECT E.dno, COUNT (*)
• Consider the GROUP BY query.
FROM Emp E
– If many tuples have E.age > 10,
WHERE E.age>10
using E.age index and sorting the GROUP BY E.dno
retrieved tuples may be costly.
– Clustered E.dno index may be
better!
• Equality queries and duplicates:
– Clustering on E.hobby helps!
SELECT E.dno
FROM Emp E
Slide No:L5-4 WHERE E.hobby=Stamps
Indexes with Composite Search Keys
• Composite Search Keys: Search on a
combination of fields. Examples of composite key
– Equality query: Every field indexes using lexicographic order.
value is equal to a constant
value. E.g. wrt <sal,age>
index:
• age=20 and sal =75
– Range query: Some field value 11,80 11
is not a constant. E.g.: 12,10 12
name age sal
• age =20; or age=20 and sal 12,20 12
> 10 13,75 bob 12 10 13
• Data entries in index sorted by search key <age, sal> cal 11 80 <age>
to support range queries. joe 12 20
– Lexicographic order, or 10,12 sue 13 75 10
Slide No:L6-2
Index-Only Plans
• A number of SELECT E.dno, COUNT(*)
queries can be <E.dno> FROM Emp E
GROUP BY E.dno
answered without
retrieving any
tuples from one <E.dno,E.sal> SELECT E.dno, MIN(E.sal)
or more of the FROM Emp E
Tree index!
relations GROUP BY E.dno
involved if a
suitable index is
<E. age,E.sal> SELECT AVG(E.sal)
available. or FROM Emp E
Slide No:L6-3
Summary
• Many alternative file organizations exist, each appropriate in some
situation.
• If selection queries are frequent, sorting the file or building an
index is important.
– Hash-based indexes only good for equality search.
– Sorted files and tree-based indexes best for range
search; also good for equality search. (Files rarely
kept sorted in practice; B+ tree index is better.)
• Index is a collection of data entries plus a way to quickly find
entries with given key values.
Slide No:L6-4
Summary (Contd.)
Slide No:L6-5
Introduction
• As for any index, 3 alternatives for data entries k*:
– Data record with key value k
– <k, rid of data record with search key value
k>
– <k, list of rids of data records with search key
k>
• Choice is orthogonal to the indexing technique used to locate
data entries k*.
• Tree-structured indexing techniques support both range
searches and equality searches.
• ISAM: static structure; B+ tree: dynamic, adjusts gracefully
under inserts and deletes.
Slide No:L7-1
Range Searches
• ``Find all students with gpa > 3.0’’
– If data is in sorted file, do binary
search to find first such student, then
scan to find others.
– Cost of binary search can be quite
high.
• Simple idea: Create an `index’ file.
k1 k2 kN Index File
Slide No:L7-2
index entry
ISAM
P K P K 2 P K m Pm
0 1 1 2
Non-leaf
Pages
Leaf
Pages
Overflow
page
Primary pages
Slide No:L7-3
Comments on ISAM
Data
• File creation: Leaf (data) pages allocated Pages
sequentially, sorted by search key; then index pages
allocated, then space for overflow pages.
Index Pages
• Index entries: <search key value, page id>; they
`direct’ search for data entries, which are in leaf pages.
• Search: Start at root; use key comparisons to go to leaf.
Overflow pages
Cost log F N ; F = # entries/index pg, N = # leaf pgs
• Insert: Find leaf data entry belongs to, and put it there.
• Delete: Find and remove from leaf; if empty overflow
page, de-allocate.
Slide No:L7-4
Example ISAM Tree
• Each node can hold 2 entries; no need for `next-leaf-
page’ pointers. (Why?)
Root
40
20 33 51 63
10* 15* 20* 27* 33* 37* 40* 46* 51* 55* 63* 97*
Slide No:L7-5
After Inserting 23*, 48*, 41*, 42* ...
Root
Index 40
Pages
20 33 51 63
Primary
Leaf
10* 15* 20* 27* 33* 37* 40* 46* 51* 55* 63* 97*
Pages
Pages
42*
Slide No:L7-6
... Then Deleting 42*, 51*, 97*
Root
40
20 33 51 63
10* 15* 20* 27* 33* 37* 40* 46* 55* 63*
Slide No:L7-7
B+ Tree: Most Widely Used Index
• Insert/delete at log F N cost; keep tree height-
balanced. (F = fanout, N = # leaf pages)
• Minimum 50% occupancy (except for root). Each
node contains d <= m <= 2d entries. The parameter
d is called the order of the tree.
• Supports equality and range-searches efficiently.
Index Entries
(Direct search)
Data Entries
("Sequence set")
Slide No:L8-1
Example B+ Tree
• Search begins at root, and key comparisons direct it
to a leaf (as in ISAM).
• Search for 5*, 15*, all data entries >= 24* ...
Root
13 17 24 30
2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*
Slide No:L8-2
B+ Trees in Practice
• Typical order: 100. Typical fill-factor: 67%.
– average fanout = 133
• Typical capacities:
– Height 4: 1334 = 312,900,700 records
– Height 3: 1333 = 2,352,637 records
• Can often hold top levels in buffer pool:
– Level 1 = 1 page = 8 Kbytes
– Level 2 = 133 pages = 1 Mbyte
– Level 3 = 17,689 pages = 133 MBytes
Slide No:L8-3
Inserting a Data Entry into a B+ Tree
• Find correct leaf L.
• Put data entry onto L.
– If L has enough space, done!
– Else, must split L (into L and a new node L2)
Slide No:L8-4
Inserting 8* into Example B+ Tree
occupancy is
guaranteed in both 2* 3* 5* 7* 8*
leaf and index pg
splits.
• Note difference
between copy-up Entry to be inserted in parent node.
17 (Note that 17 is pushed up and only
and push-up; be appears once in the index. Contrast
this with a leaf split.)
sure you
understand the
5 13 24 30
reasons for this.
Slide No:L8-5
Example B+ Tree After Inserting 8*
Root
17
5 13 24 30
2* 3* 5* 7* 8* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*
Slide No:L8-7
Example Tree After (Inserting 8*, Then)
Deleting 19* and 20* ...
Root
17
5 13 27 30
2* 3* 5* 7* 8* 14* 16* 22* 24* 27* 29* 33* 34* 38* 39*
• Must merge.
30
• Observe `toss’ of index
entry (on right), and 22* 27* 38* 39*
29* 33* 34*
`pull down’ of index
entry (below).
Root
5 13 17 30
Slide No:L8-9