0% found this document useful (0 votes)

96 views1,100 pages

Dbms Week 1 To 12 Slides

Uploaded by

RIYA CHANDRABEL

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

96 views1,100 pages

Dbms Week 1 To 12 Slides

Uploaded by

RIYA CHANDRABEL

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Module 01

Partha Pratim
Das

Objectives &
Outline Database Management Systems
Why Databases?
Module 01: Course Overview
Know Your
Course

Course Outline

Course Text Book

Partha Pratim Das
Module Summary

Department of Computer Science and Engineering

Indian Institute of Technology, Kharagpur

ppd@[Link]

Database Management Systems Partha Pratim Das 01.1

Module Objectives PPD

Module 01

Partha Pratim • To understand the importance of database management systems in modern day
Das
applications
Objectives &
Outline • To Know Your Course
Why Databases?

Know Your
Course

Course Outline

Course Text Book

Module Summary

Database Management Systems Partha Pratim Das 01.2

Module Outline PPD

Module 01

Partha Pratim • Why Databases?

Das
• KYC: Know Your Course
Objectives &
Outline ◦ Course Prerequisite
Why Databases? ◦ Course Outline
Know Your
Course
◦ Course Text Book
Course Outline

Course Text Book

Module Summary

Database Management Systems Partha Pratim Das 01.3

Why Databases? PPD

Module 01

Partha Pratim
Das

Objectives &
Outline

Why Databases?

Know Your
Course

Course Outline

Course Text Book

Module Summary

Why Databases?

Database Management Systems Partha Pratim Das 01.4

Database Management System (DBMS)

Module 01
• DBMS contains information about a particular enterprise
Partha Pratim
Das ◦ Collection of interrelated data
Objectives &
◦ Set of programs to access the data
Outline ◦ An environment that is both convenient and efficient to use
Why Databases?
• Database Applications:
Know Your
Course
◦ Banking: transactions
Course Outline
◦ Airlines: reservations, schedules
Course Text Book
◦ Universities: registration, grades
Module Summary
◦ Sales: customers, products, purchases
◦ Online retailers: order tracking, customized recommendations
◦ Manufacturing: production, inventory, orders, supply chain
◦ Human resources: employee records, salaries, tax deductions
◦ ···
• Databases can be very large
• Databases touch all aspects of our lives
Database Management Systems Partha Pratim Das 01.5
University Database Example

Module 01

Partha Pratim • Application program examples

Das
◦ Add new students, instructors, and courses
Objectives &
Outline ◦ Register students for courses, and generate class rosters
Why Databases? ◦ Assign grades to students, compute grade point averages (GPA) and generate
Know Your transcripts
Course

Course Outline • In the early days, database applications were built directly on top of file systems
Course Text Book

Module Summary

Database Management Systems Partha Pratim Das 01.6

Drawbacks of using file systems to store data

Module 01

Partha Pratim • Data redundancy and inconsistency

Das
◦ Multiple file formats, duplication of information in different files
Objectives &
Outline • Difficulty in accessing data
Why Databases?
◦ Need to write a new program to carry out each new task
Know Your
Course • Data isolation
Course Outline
◦ Multiple files and formats
Course Text Book

Module Summary • Integrity problems

◦ Integrity constraints (e.g., account balance > 0) become “buried” in program code
rather than being stated explicitly
◦ Hard to add new constraints or change existing ones

Database Management Systems Partha Pratim Das 01.7

Drawbacks of using file systems to store data (2)

Module 01
• Atomicity of updates
Partha Pratim
Das ◦ Failures may leave database in an inconsistent state with partial updates carried out
Objectives &
◦ Example: Transfer of funds from one account to another should either complete or
Outline not happen at all
Why Databases?
• Concurrent access by multiple users
Know Your
Course
◦ Concurrent access needed for performance
Course Outline
◦ Uncontrolled concurrent accesses can lead to inconsistencies
Course Text Book

Module Summary
. Example: Two people reading a balance (say 100) and updating it by
withdrawing money (say 50 each) at the same time
• Security problems
◦ Hard to provide user access to some, but not all, data
Database systems offer solutions to all the above problems

Database Management Systems Partha Pratim Das 01.8

Know Your Course PPD

Module 01

Partha Pratim
Das

Objectives &
Outline

Why Databases?

Know Your
Course

Course Outline

Course Text Book

Module Summary

Know Your Course

Database Management Systems Partha Pratim Das 01.9

Course Prerequisites: Essential PPD

Module 01
• Set Theory
Partha Pratim
Das ◦ Definition of a Set
Objectives & . Intensional Definition
Outline
. Extensional Definition
Why Databases?
. Set-builder Notation
Know Your
Course ◦ Membership, Subset, Superset, Power Set, Universal Set
Course Outline
◦ Operations on sets:
Course Text Book

Module Summary
. Union, Intersection, Complement, Difference, Cartesian Product
◦ De Morgan’s Law
◦ Courses
. MOOCs: Discrete Mathematics:
[Link]
. Online Degree Foundational Course: Mathematics for Data Science I
[Link]

Database Management Systems Partha Pratim Das 01.10

Course Prerequisites: Essential PPD

Module 01
• Relations and Functions
Partha Pratim
Das ◦ Definition of Relations
Objectives &
◦ Ordered Pairs and Binary Relations
Outline
. Domain and Range
Why Databases?
. Image, Preimage, Inverse
Know Your
Course . Properties: Reflexive, Symmetric, Antisymmetric, Transitive, Total
Course Outline
◦ Definition of Functions
Course Text Book
◦ Properties of Functions: Injective, Surjective, Bijective
Module Summary
◦ Composition of Functions
◦ Inverse of a Function
◦ Courses
. MOOCs: Discrete Mathematics:
[Link]
. Online Degree Foundational Course: Mathematics for Data Science I
[Link]

Database Management Systems Partha Pratim Das 01.11

Course Prerequisites: Essential PPD

Module 01
• Propositional Logic
Partha Pratim
Das ◦ Truth Values & Truth Tables
Objectives &
◦ Operators: conjunction (and), disjunction (or), negation (not), implication,
Outline equivalence
Why Databases?
◦ Closure under Operations
Know Your
Course ◦ Courses
Course Outline . MOOCs: Discrete Mathematics:
Course Text Book [Link]
Module Summary

Database Management Systems Partha Pratim Das 01.12

Course Prerequisites: Essential PPD

Module 01
• Predicate Logic
Partha Pratim
Das ◦ Predicates
Objectives &
◦ Quantification
Outline
. Existential
Why Databases?
. Universal
Know Your
Course ◦ Courses
Course Outline
. MOOCs: Discrete Mathematics:
Course Text Book
[Link]
Module Summary

Database Management Systems Partha Pratim Das 01.13

Course Prerequisites: Essential PPD

Module 01
• Data Structures
Partha Pratim
Das ◦ Array
Objectives &
◦ List
Outline ◦ Binary Search Tree
Why Databases?
. Balanced Tree
Know Your
Course ◦ B-Tree
Course Outline
◦ Hash Table / Map
Course Text Book
◦ Courses
Module Summary
. MOOCs: Design and Analysis of Algorithms:
[Link]
. MOOCs: Fundamental Algorithms – Design and Analysis:
[Link]

Database Management Systems Partha Pratim Das 01.14

Course Prerequisites: Essential PPD

Module 01
• Programming in Python
Partha Pratim
Das ◦ Courses
Objectives & . Online Degree Foundational Course - Programming in Python
Outline
[Link]
Why Databases?

Know Your
Course

Course Outline

Course Text Book

Module Summary

Database Management Systems Partha Pratim Das 01.15

Course Prerequisites: Desirable PPD

Module 01
• Algorithms and Programming in C
Partha Pratim
Das ◦ Sorting
Objectives & . Merge Sort
Outline
. Quick Sort
Why Databases?

Know Your
◦ Search
Course
. Linear Search
Course Outline
. Binary Search
Course Text Book

Module Summary
. Interpolation Search
◦ Courses
. MOOCs: Design and Analysis of Algorithms:
[Link]
. MOOCs: Introduction to Programming in C:
[Link]

Database Management Systems Partha Pratim Das 01.16

Course Prerequisites: Desirable PPD

Module 01
• Object-Oriented Analysis and Design
Partha Pratim
Das ◦ Courses
Objectives & . MOOCs: Object-Oriented Analysis and Design:
Outline
[Link]
Why Databases?

Know Your
Course

Course Outline

Course Text Book

Module Summary

Database Management Systems Partha Pratim Das 01.17

Course Outline PPD

Module 01

Partha Pratim
Das

Objectives &
Outline

Why Databases?

Know Your
Course

Course Outline

Course Text Book

Module Summary

Database Management Systems Partha Pratim Das 01.18

Course Textbook PPD

Module 01
Database System Concepts,
Partha Pratim
Das Sixth Edition,
Objectives &
Outline
Abraham Silberschatz,
Why Databases?
Henry Korth,
Know Your
Course S. Sudarshan,
Course Outline

Course Text Book

Publisher: McGraw Hill Education
Module Summary
ISBN: 0073523321

Website: [Link]
7th Edition will also do

Database Management Systems Partha Pratim Das 01.19

Module Summary PPD

Module 01

Partha Pratim • Elucidates the importance of database management systems in modern day applications
Das
• Introduced various aspects of the Course
Objectives &
Outline

Why Databases?

Know Your
Course

Course Outline

Course Text Book

Module Summary
Slides used in this presentation are borrowed from [Link] with kind permission of the
authors.
Edited and new slides are marked with “PPD”.

Database Management Systems Partha Pratim Das 01.20

Module 02

Partha Pratim
Das

Objectives &
Outline Database Management Systems
Evolution of Data
Management Module 02: Why DBMS?/1
History

Module Summary

Partha Pratim Das

Department of Computer Science and Engineering

Indian Institute of Technology, Kharagpur

ppd@[Link]

Database Management Systems Partha Pratim Das 02.1

Module Objectives PPD

Module 02

Partha Pratim • To understand the need for a DBMS from historical perspective
Das

Partha Pratim
Das
• Storage
Objectives &
• Retrieval
Outline
• Transaction
Evolution of Data
Management
• Audit
History

Module Summary
• Archival
For:
• Individual
• Small / Big Enterprise
• Global
There have been two major approaches in this practice:
• Physical
• Electronic
Database Management Systems Partha Pratim Das 02.5
Data Management: Physical PPD

Module 02

Partha Pratim
Physical Data or Records management, more formally known as Book Keeping, has been
Das using physical ledgers and journals for centuries.
Objectives &
Outline
The most significant development happened when Henry Brown, an American inventor,
Evolution of Data
Management patented a “receptacle for storing and preserving papers” on November 2, 1886.
History

Module Summary
Herman Hollerith adapted the punch cards used for weaving looms to act as the memory
for a mechanical tabulating machine, in 1890.

Database Management Systems Partha Pratim Das 02.6

Data Management: Electronic PPD

Module 02 Electronic Data or Records management moves with the advances in technology -
Partha Pratim especially of memory, storage, computing, and networking.
Das
• 1950s: Computer Programming started
Objectives &
Outline
• 1960s: Data Management with punch card / tapes and magnetic tapes
Evolution of Data
Management • 1970s:
History
◦ COBOL and CODASYL approach was introduced in 1971
Module Summary
◦ On October 14 in 1979, Apple II platform shipped VisiCalc, marking the birth of the
spreadsheet
◦ Magnetic disks became prevalent
• 1980s: RDBMS changed the face of data management
• 1990s: With Internet data management started becoming global
• 2000s: e-Commerce boomed, NoSQL was introduced for unstructured data
management
• 2010s: Data Science started riding high
Database Management Systems Partha Pratim Das 02.7
Electronic Data Management Parameters PPD

Module 02 Electronic Data or Records management depends on various parameters including:

Partha Pratim
Das
• Durability
Objectives &
• Scalability
Outline
• Security
Evolution of Data
Management
• Retrieval
History

Module Summary
• Ease of Use
• Consistency
• Efficiency
• Cost
• ...

Database Management Systems Partha Pratim Das 02.8

Book Keeping PPD

Module 02

Partha Pratim
Recall how shop owners used to maintain their accounts.
Das A book register was maintained on which the shop owner wrote the amount received from
Objectives & customers, the amount due for any customer, inventory details and so on.
Outline

Evolution of Data
Management Problems with such an approach of book-keeping:
History
• Durability: Physical damage to these registers is a possibility due to rodents, humidity,
Module Summary
wear and tear
• Scalability: Very difficult to maintain for many years, some shops have numerous
registers spanning over years
• Security: Susceptible to tampering by outsiders
• Retrieval: Time consuming process to search for a previous entry
• Consistency: Prone to human errors
Not only small shops but large organizations also used to maintain their transaction details
in book registers.
Database Management Systems Partha Pratim Das 02.9
Spreadsheet Files - A better solution PPD

Module 02

Partha Pratim
Spreadsheet Softwares like Google Sheets: Due to the disadvantages of maintaining
Das ledger registers, organizations dealing with huge amount of data shifted to using
Objectives & spreadsheet softwares for maintaining their records in files.
Outline

Evolution of Data
• Durability: These are computer applications and hence data is less prone to physical
Management
damage.
History

Module Summary
• Scalability: Easier to search, insert and modify records as compared to book ledgers
• Security: Can be password-protected
• Easy of Use: Computer applications are used to search and manipulate records in the
spreadsheets leading to reduction in manpower needed to perform routine computations
• Consistency: Not guaranteed but spreadsheets are less prone to mistakes than
registers.

Mostly useful for single user or small enterprise applications

Database Management Systems Partha Pratim Das 02.10

Why leave filesystems?

Module 02

Partha Pratim
Lack of efficiency in meeting growing needs PPD
Das
• With rapid scale up of data, there has been considerable increase in the time required
Objectives &
Outline
to perform most operations.
Evolution of Data • A typical spreadsheet file may have an upper limit on the number of rows.
Management

History • Ensuring consistency of data is a big challenge.

Module Summary
• No means to check violations of constraints in the face of concurrent processing.
• Unable to give different permissions to different people in a centralized manner.
• A system crash could be catastrophic.
The above limitations of filesystems paved the way for a comprehensive platform dedicated
to management of data - the Database Management Systems.

Database Management Systems Partha Pratim Das 02.11

History of DBMS PPD

Module 02

Partha Pratim
Das

Objectives &
Outline

Evolution of Data
Management

History

Module Summary

History of DBMS

Database Management Systems Partha Pratim Das 02.12

History of Database Systems

Module 02

Partha Pratim • 1950s and early 1960s:

Das
◦ Data processing using magnetic tapes for storage
Objectives &
Outline . Tapes provided only sequential access
Evolution of Data
Management
◦ Punched cards for input
History • Late 1960s and 1970s:
Module Summary
◦ Hard disks allowed direct access to data
◦ Network and hierarchical data models in widespread use
◦ Ted Codd defines the relational data model
. Would win the ACM Turing Award for this work
. IBM Research begins System R prototype
. UC Berkeley begins Ingres prototype
◦ High-performance (for the era) transaction processing

Database Management Systems Partha Pratim Das 02.13

Objectives &
Outline

Evolution of Data
Management

History

Module Summary

Database Management Systems Partha Pratim Das 02.17

Module Summary PPD

Module 02

Partha Pratim • Walk through of evolution of Data and Records Management

Das
• History of DBMS
Objectives &
Outline

Evolution of Data
Management

History

Module Summary

Slides used in this presentation are borrowed from [Link] with kind
permission of the authors.
Edited and new slides are marked with “PPD”.

Database Management Systems Partha Pratim Das 02.18

Module 03

Partha Pratim
Das

Objectives &
Outline Database Management Systems
File Systems vs
Databases Module 03: Why DBMS?/2
Python viz-a-viz SQL
Parameterized
Comparison

Module Summary
Partha Pratim Das

Department of Computer Science and Engineering

Indian Institute of Technology, Kharagpur

ppd@[Link]

Database Management Systems Partha Pratim Das 03.1

Module Recap PPD

Module 03

Partha Pratim • Evolution of Data and Records Management

Das
• History of DBMS
Objectives &
Outline

File Systems vs
Databases
Python viz-a-viz SQL
Parameterized
Comparison

Module Summary

Database Management Systems Partha Pratim Das 03.2

Module Objectives PPD

Module 03

Partha Pratim • Comparison of File based data management and DBMS

Das

Objectives &
Outline

File Systems vs
Databases
Python viz-a-viz SQL
Parameterized
Comparison

Module Summary

Case study: A bank transaction (2) PPD

Module 03

Partha Pratim
We will use this banking transaction system to compare various features of a file-based
Das (spreadsheet/.csv files) implementation viz-a-viz a DBMS-based implementation
Objectives &
Outline
• Account details are stored in
File Systems vs ◦ [Link] for file-based implementation
Databases
Python viz-a-viz SQL
◦ Accounts table for DBMS implementation
Parameterized
Comparison • The transaction details are stored in
Module Summary
◦ [Link] file for file-based implementation
◦ Ledger table for DBMS implementation
In the following slides we discuss a fund transfer transaction.

Source: https: // github. com/ bhaskariitm/ transition-from-files-to-db/ tree/ main

Database Management Systems Partha Pratim Das 03.7

PPD

Module 03

Partha Pratim
Das

Objectives &
Outline

File Systems vs
Databases
Python viz-a-viz SQL
Parameterized
Comparison

Module Summary

Python viz-a-viz SQL

Database Management Systems Partha Pratim Das 03.8

Bank Transaction: Python viz-a-viz SQL PPD

Module 03 Python SQL

Partha Pratim
Das
def begin_Transaction(creditAcc,
debitAcc, amount):
Objectives &
Outline temp = [] // Handled implicitly by the DBMS
File Systems vs success = 0
Databases
Python viz-a-viz SQL # Open file handles to retrieve and
Parameterized store transaction data
Comparison

Module Summary f_obj_Account1 =

open(’[Link]’, ’r’)
f_reader1 =
[Link](f_obj_Account1)
f_obj_Account2 =
open(’[Link]’, ’r’)
f_reader2 =
[Link](f_obj_Account2)
f_obj_Ledger =
open(’[Link]’, ’a+’)
f_writer =
[Link](f_obj_Ledger,
fieldnames=col_name_Ledger)
Database Management Systems Partha Pratim Das 03.9
Bank Transaction: Python viz-a-viz SQL (2) PPD

Module 03 Python SQL

Database Management Systems Partha Pratim Das 03.10

Bank Transaction: Python viz-a-viz SQL (3) PPD

Module 03 Python SQL

Database Management Systems Partha Pratim Das 03.11

Bank Transaction: Python viz-a-viz SQL (4) PPD

Module 03 Python SQL

Partha Pratim
Das try : do $$
for sRec in f_reader1 : begin
Objectives & # CONDITION CHECK FOR ENOUGH BALANCE
Outline
amt = 5000;
if sRec [ " AcctNo " ] == debitAcc and sendVal = ’1800090’;
File Systems vs int ( sRec [ " Balance " ]) > int ( amt ) : recVal = ’1800100’;
Databases
for rRec in f_reader2 : select balance from accounts
Python viz-a-viz SQL
if rRec [ " AcctNo " ] == creditAcc : into sbalance
Parameterized
Comparison sRec [ " Balance " ] = # DEBIT where account_no = sendVal;
Module Summary
str ( int ( sRec [ " Balance " ]) - int ( amt ) ) if sbalance < amt then
temp . append ( sRec ) raise notice "Insufficient balance";
# Critical point else
f_writer . writerow ({ update accounts
" Acct1 " : sRec [ " AcctNo " ] , set balance =
" Acct2 " : rRec [ " AcctNo " ] , balance - amt
" Amount " : amt , " D / C " : " D " }) where account_no = sendVal;
rRec [ " Balance " ] = # CREDIT insert into
str ( int ( rRec [ " Balance " ]) + int ( amt ) ) ledger(sendAc, recAc, amnt, ttype)
temp . append ( rRec ) values(sendVal, recVal, amt, ’D’);
... update accounts
set balance =
balance + amt
where account_no = recVal;
Database Management Systems Partha Pratim Das ... 03.12
Bank Transaction: Python viz-a-viz SQL (5) PPD

Module 03
Python SQL
Partha Pratim
try : do $$
Das
for sRec in f_reader1 : begin
# CONDITION CHECK FOR ENOUGH BALANCE amt = 5000;
Objectives & if sRec [ " AcctNo " ] == debitAcc and sendVal = ’1800090’;
Outline int ( sRec [ " Balance " ]) > int ( amt ) : recVal = ’1800100’;
for rRec in f_reader2 : select balance from accounts
File Systems vs if rRec [ " AcctNo " ] == creditAcc : into sbalance
Databases sRec [ " Balance " ] = # DEBIT where account_no = sendVal;
Python viz-a-viz SQL str ( int ( sRec [ " Balance " ]) - int ( amt ) ) if sbalance < amt then
Parameterized temp . append ( sRec ) raise notice "Insufficient balance";
Comparison # Critical point else
f_writer . writerow ({ " Acct1 " : sRec [ " AcctNo " ] , update accounts
Module Summary " Acct2 " : rRec [ " AcctNo " ] , set balance =
" Amount " : amt , " D / C " : " D " }) balance - amt
rRec [ " Balance " ] = # CREDIT where account_no = sendVal;
str ( int ( rRec [ " Balance " ]) + int ( amt ) ) insert into
temp . append ( rRec ) ledger(sendAc, recAc, amnt, ttype)
f_writer . writerow ({ " Acct1 " : rRec [ " AcctNo " ] , values(sendVal, recVal, amt, ’D’);
" Acct2 " : sRec [ " AcctNo " ] , update accounts
" Amount " : amt , " D / C " : " C " }) set balance =
success = success + 1 balance + amt
break where account_no = recVal;
f_ob j_Accoun t1 . seek (0) insert into
next ( f_ obj_Account1 ) ledger(sendAc, recAc, amnt, ttype)
for record in f_reader1 : values(recVal, sendVal, amt, ’C’);
if record [ " AcctNo " ] != temp [0][ " AcctNo " ] and commit;
record [ " AcctNo " ] != temp [1][ " AcctNo " ]: raise notice "Successful";
temp . append ( record ) end if;
except : end; $$
print ( " Wrong input entered !!! " )
Database Management Systems Partha Pratim Das 03.13
Bank Transaction: Python viz-a-viz SQL (6) PPD

Module 03 Python SQL

Partha Pratim
Das
#Writing back to the file

Objectives & f_obj_Account1.close()

Outline f_obj_Account2.close() // Handled implicitly by the DBMS
File Systems vs
f_obj_Ledger.close()
Databases
Python viz-a-viz SQL if success == 1:
Parameterized f_obj_Account = open(’[Link]’, ’w+’, newline=’’)
Comparison
f_writer = [Link](f_obj_Account,
Module Summary fieldnames=col_name_Account)
f_writer.writeheader()
for data in temp:
f_writer.writerow(data)

f_obj_Account.close()
print("Transaction is successful !!")

else:
print(’Transaction failed : Confirm Account details’)

Database Management Systems Partha Pratim Das 03.14

Comparison PPD

Module 03
Parameter File Handling via Python DBMS
Partha Pratim Scalability with re- Very difficult to handle insert, update and In-built features to provide high scalability for
Das spect to querying of records a large number of records
amount of data
Objectives &
Outline
Scalability with re- Extremely difficult to change the structure of Adding or removing attributes can be done
spect to changes records as in the case of adding or removing seamlessly using simple SQL queries
File Systems vs in structure attributes
Databases
Python viz-a-viz SQL
Time of execution In seconds In milliseconds
Parameterized
Persistence Data processed using temporary data struc- Data persistence is ensured via automatic, sys-
Comparison tures have to be manually updated to the file tem induced mechanisms
Module Summary Robustness Ensuring robustness of data has to be done Backup, recovery and restore need minimum
manually manual intervention
Security Difficult to implement in Python (Security at User-specific access at database level
OS level)
Programmer’s Most file access operations involve extensive Standard and simple built-in queries reduce the
productivity coding to ensure persistence, robustness and effort involved in coding thereby increasing a
security of data programmer’s throughput
Arithmetic opera- Easy to do arithmetic computations Limited set of arithmetic operations are avail-
tions able
Costs Low costs for hardware, software and human High costs for hardware, software and human
resources resources

Database Management Systems Partha Pratim Das 03.15

PPD

Module 03

Partha Pratim
Das

Objectives &
Outline

File Systems vs
Databases
Python viz-a-viz SQL
Parameterized
Comparison

Module Summary

Parameterized Comparison

Database Management Systems Partha Pratim Das 03.16

Scalability PPD

Module 03 File handling via Python DBMS

Partha Pratim
Das • Number of records: As the # of records • Number of records: Databases are built
increases, the efficiency of flat files to efficiently scale up when the # of
Objectives &
Outline
reduces: records increase drastically.
File Systems vs
Databases ◦ the time spent in searching for the ◦ In-built mechanisms, like indexing, for
Python viz-a-viz SQL
right records quick access of right data.
Parameterized
Comparison
◦ the limitations of the OS in handling • Structural Change: During adding an
Module Summary
huge files attribute, a default value can be defined
• Structural Change: To add an attribute, that holds for all existing records - the
initializing the new attribute of each new attribute gets initialized with the
record with a default value has to be done default value. During deletion, constraints
by program. It is very difficult to detect are used either not to allow the removal
and maintain relationships between or ensure its safe removal
entities if and when an attribute has to be
removed.
Database Management Systems Partha Pratim Das 03.17
Time and Efficiency PPD

Module 03

Partha Pratim
File handling via Python DBMS
Das
• The effort needed to implement a file • The effort to install and configure a DB
Objectives &
Outline handler is quite less in Python in a DB server is expensive & time
File Systems vs
• In order to process a 1GB file, a program consuming
Databases
Python viz-a-viz SQL in Python would typically take few • In order to process a 1GB file, an SQL
Parameterized
Comparison seconds. query would typically take few
Module Summary milliseconds.

• If the number of records is very small, the overhead in installing and configuring a
database will be much more than the time advantage obtained from executing the
queries.
• However, if the number of records is really large, then the time required in the
initialization process of a database will be negligible as compared to the time saved in
using SQL queries.

Database Management Systems Partha Pratim Das 03.18

Persistence, Robustness, Security PPD

Module 03 File handling via Python DBMS

Partha Pratim
Das • Persistence: Data processed using • Persistence: Data persistence is ensured
in-memory data structures stay in the via automatic, system mechanisms. The
Objectives &
Outline
memory during processing. After updates, programmer does not have to worry about
File Systems vs
Databases these are manually updated to the file on the data getting lost due to manual errors
Python viz-a-viz SQL
disk • Robustness: Backup, recovery & restore
Parameterized
Comparison
• Robustness: Ensuring consistency, need minimum manual intervention. The
Module Summary
reliability and sanity is manual via backup and recovery plan can be devised
multiple checks. On a system crash, a for automatic recovery on a crash
transaction may cause inconsistency or • Security: DBMS provides user-specific
loss of data. access at the database level with
• Security: Extremely difficult to restriction for to view only access
implement granular security in file
systems. Authentication is at the OS
level.
Database Management Systems Partha Pratim Das 03.19
Programmer’s Productivity PPD

Module 03 File handling via Python DBMS

Partha Pratim
Das • Building the file handler: Since the • Configuring the database: The
constraints within and across entities installation and configuration of a
Objectives &
Outline
have to be enforced manually, the effort database is specialized job of a DBA. A
File Systems vs
Databases involved in building a file handling programmer, on the other hand, is saved
Python viz-a-viz SQL
application is huge the trouble
Parameterized
Comparison
• Maintenance: To maintain the • Maintenance: DBMS has in-built
Module Summary
consistency of data, one must regularly mechanisms to ensure consistency and
check for sanity of data and the sanity of data being inserted, updated or
relationships between entities during deleted. The programmer does not need
inserts, updates and deletes to do such checks
• Handling huge data: As the data grows • Handling huge data: DBMS can handle
beyond the capacity of the file handler, even terabytes of data - Programmer does
more efforts are needed not have to worry

Database Management Systems Partha Pratim Das 03.20

Arithmetic Operations PPD

Module 03

Partha Pratim
File handling via Python DBMS
Das
• Extensive support for arithmetic and • Limited support for arithmetic and
Objectives &
Outline logical operations: Extensive arithmetic logical operations: SQL provides limited
File Systems vs and logical operations can be performed arithmetic and logical operations. Any
Databases
Python viz-a-viz SQL on data using Python. These include other complex computation has to be
Parameterized
Comparison complex numerical calculations and done outside the SQL.
Module Summary recursive computations.

Database Management Systems Partha Pratim Das 03.21

Costs and Complexity PPD

Module 03

Partha Pratim
File handling via Python DBMS
Das
• File systems are cheaper to install and • Large databases are served by dedicated
Objectives &
Outline use. No specialized hardware, software or database servers need large storage and
File Systems vs personnel are required to maintain processing power
Databases
Python viz-a-viz SQL filesystems. • DBMSs are expensive software that have
Parameterized
Comparison to be installed and regularly updated
Module Summary
• Databases are inherently complex and
need specialized people to work on it -
like DBA
• The above factors lead to huge costs in
implementing and maintaining database
management systems

Database Management Systems Partha Pratim Das 03.22

Module Summary PPD

Module 03

Partha Pratim • Elucidated the difference between File handling by Python viz-a-viz DBMS through an
Das
Bank Transaction example
Objectives &
Outline • Parameterized Comparison
File Systems vs
Databases
Python viz-a-viz SQL
Parameterized
Comparison

Module Summary

Slides used in this presentation are borrowed from [Link] with kind
permission of the authors.
Edited and new slides are marked with “PPD”.

Database Management Systems Partha Pratim Das 03.23

Module 04

Partha Pratim
Das

Objectives &
Outline Database Management Systems
Levels of
Abstraction Module 04: Introduction to DBMS/1
Schema and
Instance

Data Models

DDL and DML Partha Pratim Das

SQL

Database Design Department of Computer Science and Engineering

Module Summary Indian Institute of Technology, Kharagpur

ppd@[Link]

Database Management Systems Partha Pratim Das 04.1

Module Recap PPD

Module 04

Partha Pratim • Comparison of data management using Python & files and DBMS
Das
• Efficacy and Efficient DBMS highlighted
Objectives &
Outline

Levels of
Abstraction

Schema and
Instance

Data Models

DDL and DML

SQL

Database Design

Module Summary

Database Management Systems Partha Pratim Das 04.2

Module Objectives PPD

Module 04

Partha Pratim • To familiarize with the basic notions and terminology of database management systems
Das
• To understand the role of data models and languages
Objectives &
Outline • To understand the approaches to database design
Levels of
Abstraction

Schema and
Instance

Data Models

DDL and DML

SQL

Database Design

Module Summary

Database Management Systems Partha Pratim Das 04.3

Database Management Systems Partha Pratim Das 04.6

View of Data

Module 04

Partha Pratim
An architecture for a database system
Das

Objectives &
Outline

accounts in a bank and the relationship between them
SQL
. Customer Schema
Database Design

Module Summary . Account Schema

Partha Pratim
Das

Objectives &
Outline

Levels of
Abstraction

Schema and
Instance

Data Models

DDL and DML

SQL

Database Design

Module Summary
Data Models

Database Management Systems Partha Pratim Das 04.12

Data Models

Module 04 • A collection of tools for describing

Partha Pratim ◦ Data
Das
◦ Data relationships
Objectives &
Outline
◦ Data semantics
Levels of
◦ Data constraints
Abstraction
• Relational model (we focus in this course)
Schema and
Instance • Entity-Relationship data model (mainly for database design)
Data Models
• Object-based data models (Object-oriented and Object-relational)
DDL and DML

Data Definition Language (DDL)

Module 04

Partha Pratim • Specification notation for defining the database schema

Das
◦ Example:
Objectives &
Outline create table instructor (
Levels of ID char(5),
Abstraction
name varchar(20),
Schema and
Instance dept name varchar(20),
Data Models salary numeric(8,2))
DDL and DML
• DDL compiler generates a set of table templates stored in a data dictionary
SQL

Database Design • Data dictionary contains metadata (that is, data about data)
Module Summary ◦ Database schema
◦ Integrity constraints
. Primary key (ID uniquely identifies instructors)
◦ Authorization
. Who can access what
Database Management Systems Partha Pratim Das 04.18
Data Manipulation Language (DML)

Module 04

Partha Pratim • Language for accessing and manipulating the data organized by the appropriate data
Das
model
Objectives &
Outline ◦ DML: also known as Query Language
Levels of
Abstraction
• Two classes of languages
Schema and ◦ Pure – used for proving properties about computational power and for optimization
Instance

Data Models
. Relational Algebra (we focus in this course)
DDL and DML . Tuple relational calculus
SQL . Domain relational calculus
Database Design ◦ Commercial – used in commercial systems
Module Summary
. SQL is the most widely used commercial language

Database Management Systems Partha Pratim Das 04.19

SQL PPD

Module 04

Partha Pratim
Das

Objectives &
Outline

Levels of
Abstraction

Schema and
Instance

Data Models

DDL and DML

SQL

Database Design

Module Summary
SQL

Database Management Systems Partha Pratim Das 04.20

SQL

Module 04

Partha Pratim • The most widely used commercial language

Das
• SQL is NOT a Turing Machine equivalent language
Objectives &
Outline ◦ Cannot be used to solve all problems that a C program, for example, can solve
Levels of
Abstraction • To be able to compute complex functions, SQL is usually embedded in some
Schema and higher-level language
Instance

Data Models • Application programs generally access databases through one of

DDL and DML ◦ Language extensions to allow embedded SQL
SQL
◦ Application Programming Interface or API (for example, ODBC/JDBC) which allow
Database Design
SQL queries to be sent to a database
Module Summary

Database Management Systems Partha Pratim Das 04.21

Database Design PPD

Module 04

Partha Pratim
Das

Objectives &
Outline

Levels of
Abstraction

Schema and
Instance

Data Models

DDL and DML

SQL

Database Design

Module Summary
Database Design

Database Management Systems Partha Pratim Das 04.22

Database Design

Module 04

Partha Pratim
The process of designing the general structure of the database:
Das
• Logical Design – Deciding on the database schema. Database design requires that we
Objectives &
Outline
find a good collection of relation schema
Levels of ◦ Business decision
Abstraction

Schema and
. What attributes should we record in the database?
Instance
◦ Computer Science decision
Data Models

DDL and DML

. What relation schemas should we have and how should the attributes be
SQL distributed among the various relation schemas?
Database Design • Physical Design – Deciding on the physical layout of the database
Module Summary

Database Management Systems Partha Pratim Das 04.23

Database Design (2)

Module 04

Partha Pratim • Is there any problem with this relation?

Das

Objectives &
Outline

Levels of
Abstraction

Schema and
Instance

Data Models

DDL and DML

SQL

Database Design

Module Summary

Database Management Systems Partha Pratim Das 04.24

Module Summary PPD

Module 04

Partha Pratim • Familiarized with the basic notions and terminology of database management systems
Das
• Introduced the role of data models and languages
Objectives &
Outline • Introduced the approaches to database design
Levels of
Abstraction

Schema and
Instance

Data Models

DDL and DML

SQL

Database Design

Module Summary

Slides used in this presentation are borrowed from [Link] with kind
permission of the authors.
Edited and new slides are marked with “PPD”.

Database Management Systems Partha Pratim Das 04.25

Module 05

Das
• OO Relational Model
Objectives &
Outline • XML
Database Design
Object-Relational • Database Engine
Data Models
XML: Extensible
Markup Language
◦ Storage Management
Database Engine
◦ Query Processing
Database System
Internals
◦ Transaction Management
Database Users • Database Internals and Architecture
& Administrators

Module Summary • Database Users and Administrators

Database Management Systems Partha Pratim Das 05.4

Database Design PPD

Module 05

Partha Pratim
Das

Objectives &
Outline

Database Design
Object-Relational
Data Models
XML: Extensible
Markup Language

Database Engine
Database System
Internals

Database Users
& Administrators

Database Design
Object-Relational
Data Models
XML: Extensible
Markup Language

Database Engine
Database System
Internals

Database Users
& Administrators

Module Summary

Database Management Systems Partha Pratim Das 05.8

Design Approaches

Module 05

Partha Pratim • Need to come up with a methodology to ensure that each relations in the database is
Das
good
Objectives &
Outline • Two ways of doing so:
Database Design ◦ Entity Relationship Model (Chapter 7)
Object-Relational
Data Models
XML: Extensible
. Models an enterprise as a collection of entities and relationships
Markup Language
. Represented diagrammatically by an entity-relationship diagram
Database Engine
Database System ◦ Normalization Theory (Chapter 8)
Internals

Database Users . Formalize what designs are bad, and test for them
& Administrators

Module Summary

Database Management Systems Partha Pratim Das 05.9

Object-Relational Data Models PPD

Module 05

Partha Pratim
Das

Objectives &
Outline

Database Design
Object-Relational
Data Models
XML: Extensible
Markup Language

Database Engine
Database System
Internals

Database Users
& Administrators

Module Summary

Object-Relational Data Models

Database Management Systems Partha Pratim Das 05.10

Object-Relational Data Models

Module 05

Partha Pratim • Relational model: flat, atomic values

Das
• Object Relational Data Models
Objectives &
Outline ◦ Extend the relational data model by including object orientation and constructs to
Database Design deal with added data types
Object-Relational
Data Models ◦ Allow attributes of tuples to have complex types, including non-atomic values such
XML: Extensible
Markup Language as nested relations
Database Engine
Database System
◦ Preserve relational foundations, in particular the declarative access to data, while
Internals
extending modeling power
Database Users
& Administrators ◦ Provide upward compatibility with existing relational languages
Module Summary

Database Management Systems Partha Pratim Das 05.11

XML: Extensible Markup Language PPD

Module 05

Partha Pratim
Das

Objectives &
Outline

Database Design
Object-Relational
Data Models
XML: Extensible
Markup Language

Database Engine
Database System
Internals

Database Users
& Administrators

Module Summary

XML: Extensible Markup Language

Database Management Systems Partha Pratim Das 05.12

XML: Extensible Markup Language

Module 05

Partha Pratim • Defined by the WWW Consortium (W3C)

Das
• Originally intended as a document markup language not a database language
Objectives &
Outline • The ability to specify new tags, and to create nested tag structures made XML a great
Database Design
Object-Relational
way to exchange data, not just documents
Data Models
XML: Extensible • XML has become the basis for all new generation data interchange formats
Markup Language

Database Engine • A wide variety of tools is available for parsing, browsing and querying XML
Database System
Internals
documents/data
Database Users
& Administrators

Module Summary

Database Management Systems Partha Pratim Das 05.13

Database Engine PPD

Module 05

Partha Pratim
Das

Objectives &
Outline

Database Design
Object-Relational
Data Models
XML: Extensible
Markup Language

Database Engine
Database System
Internals

Database Users
& Administrators

Module Summary

Database Engine

Database Management Systems Partha Pratim Das 05.14

Database Engine PPD

Module 05

Partha Pratim • Storage manager

Das
• Query processing
Objectives &
Outline • Transaction manager
Database Design
Object-Relational
Data Models
XML: Extensible
Markup Language

Database Engine
Database System
Internals

Database Users
& Administrators

Module Summary

Database Management Systems Partha Pratim Das 05.15

Storage Management

Module 05

Partha Pratim • Storage manager is a program module that provides the interface between the
Das
low-level data stored in the database and the application programs and queries
Objectives &
Outline
submitted to the system
Database Design • The storage manager is responsible to the following tasks:
Object-Relational
Data Models ◦ Interaction with the OS file manager
XML: Extensible
Markup Language ◦ Efficient storing, retrieving and updating of data
Database Engine
Database System • Issues:
Internals

Database Users
◦ Storage access
& Administrators
◦ File organization
Module Summary
◦ Indexing and hashing

Database Management Systems Partha Pratim Das 05.16

Query Processing

Module 05

Partha Pratim a) Parsing and translation

Das
b) Optimization
Objectives &
Outline c) Evaluation
Database Design
Object-Relational
Data Models
XML: Extensible
Markup Language

Database Engine
Database System
Internals

Database Users
& Administrators

Module Summary

Database Management Systems Partha Pratim Das 05.17

Query Processing (2)

Module 05

Partha Pratim • Alternative ways of evaluating a given query

Das
◦ Equivalent expressions
Objectives &
Outline ◦ Different algorithms for each operation
Database Design • Cost difference between a good and a bad way of evaluating a query can be enormous
Object-Relational
Data Models
XML: Extensible
• Need to estimate the cost of operations
Markup Language

Database Engine
◦ Depends critically on statistical information about relations which the database
Database System
Internals
must maintain
Database Users
◦ Need to estimate statistics for intermediate results to compute cost of complex
& Administrators
expressions
Module Summary

Database Management Systems Partha Pratim Das 05.18

Transaction Management

Module 05

Partha Pratim • What if the system fails?

Das
• What if more than one user is concurrently updating the same data?
Objectives &
Outline • A transaction is a collection of operations that performs a single logical function in a
Database Design
Object-Relational
database application
Data Models
XML: Extensible • Transaction-management component ensures that the database remains in a
Markup Language
consistent (correct) state despite system failures (e.g., power failures and operating
Database Engine
Database System system crashes) and transaction failures.
Internals

Database Users • Concurrency-control manager controls the interaction among the concurrent
& Administrators
transactions, to ensure the consistency of the database.
Module Summary

Database Management Systems Partha Pratim Das 05.19

Database System Internals PPD

Module 05

Partha Pratim
Das

Objectives &
Outline

Database Design
Object-Relational
Data Models
XML: Extensible
Markup Language

Database Engine
Database System
Internals

Database Users
& Administrators

Module Summary

Database System Internals

Database Management Systems Partha Pratim Das 05.20

Database System Internals

Module 05

Partha Pratim
Das

Objectives &
Outline

Database Design
Object-Relational
Data Models
XML: Extensible
Markup Language

Database Management Systems Partha Pratim Das 05.24

Database Users and Administrators

Module 05

Partha Pratim
Das

Objectives &
Outline

Database Design
Object-Relational
Data Models
XML: Extensible
Markup Language

Database Engine
Database System
Internals
Database
Database Users
& Administrators

Module Summary

Database Management Systems Partha Pratim Das 05.25

Module Summary PPD

Module 05

Partha Pratim • Introduced models of database management systems

Das
• Familiarized with major components of a database engine
Objectives &
Outline • Familiarized with database internals and architecture
Database Design
Object-Relational
Data Models
XML: Extensible
Markup Language

Database Engine
Database System
Internals

Database Users
& Administrators

Module Summary

Slides used in this presentation are borrowed from [Link] with kind
permission of the authors.
Edited and new slides are marked with “PPD”.

Database Management Systems Partha Pratim Das 05.26

Module 06

Partha Pratim
Das

Week Recap

Objectives &
Database Management Systems
Outline
Module 06: Introduction to Relational Model/1
Example of a
Relation

Attributes

Schema and
Instance Partha Pratim Das
Keys

Relational Query Department of Computer Science and Engineering

Languages
Indian Institute of Technology, Kharagpur
Module Summary
ppd@[Link]

Database Management Systems Partha Pratim Das 06.1

Week Recap PPD

Example of a Relation

Module 06

Partha Pratim
Das

Week Recap

Objectives &
Outline

Example of a
Relation

Attributes

Schema and
Instance

Keys

Relational Query
Languages

Module Summary

Database Management Systems Partha Pratim Das 06.5

Attributes PPD

Module 06

Partha Pratim
Das

Week Recap

Objectives &
Outline

Example of a
Relation

Attributes

Schema and
Instance

Keys

Relational Query
Languages

Module Summary
Attributes

Database Management Systems Partha Pratim Das 06.6

Attribute Types PPD

Module 06

Partha Pratim • Consider

Das
Students = Roll#, First Name, Last Name, DoB, Passport#, Aadhaar #, Department
Week Recap relation
Objectives &
Outline • The set of allowed values for each attribute is called the domain of the attribute
Example of a
Relation
◦ Roll #: Alphanumeric string
Attributes ◦ First Name, Last Name: Alpha String
Schema and ◦ DoB: Date
Instance
◦ Passport #: String (Letter followed by 7 digits) – nullable (optional)
Keys

Relational Query
◦ Aadhaar #: 12-digit number
Languages ◦ Department: Alpha String
Module Summary
• Attribute values are (normally) required to be atomic; that is, indivisible
• The special value null is a member of every domain. Indicates that the value is unknown
• The null value may cause complications in the definition of many operations

Database Management Systems Partha Pratim Das 06.7

Attribute Types PPD

Module 06
• For
Partha Pratim
Das
Students = Roll#, First Name, Last Name, DoB, Passport#, Aadhaar #, Department
Week Recap
• And domain of the attributes as:
Objectives & ◦ Roll #: Alphanumeric string
Outline
◦ First Name, Last Name: Alpha String
Example of a
Relation ◦ DoB: Date
Attributes ◦ Passport #: String (Letter followed by 7 digits) – nullable (optional)
Schema and
Instance
◦ Aadhaar #: 12-digit number
Keys ◦ Department: Alpha String
Relational Query
Languages

Module Summary

Database Management Systems Partha Pratim Das 06.8

Schema and Instance PPD

Module 06

Partha Pratim
Das

Week Recap

Objectives &
Outline

Example of a
Relation

Attributes

Schema and
Instance

Keys

Relational Query
Languages

Module Summary
Schema and Instance

Database Management Systems Partha Pratim Das 06.9

Relation Schema and Instance

Module 06

Partha Pratim • A1 , A2 , · · · , An are attributes

Das
• R = (A1 , A2 , · · · , An ) is a relation schema
Week Recap
Example: instructor = (ID, name, dept name, salary )
Objectives &
Outline
• Formally, given sets D1 , D2 , · · · , Dn a relation r is a subset of
Example of a
Relation

Attributes D 1 × D2 × · · · × D n
Schema and
Instance
Thus, a relation is a set of n-tuples (a1 , a2 , · · · , an ) where each ai ∈ Di
Keys

Relational Query
• The current values (relation instance) of a relation are specified by a table
Languages
• An element t of r is a tuple, represented by a row in a table
Module Summary
• Example:
instructor ≡ (String (5) × String × String × Number +), where ID ∈ String (5),
name ∈ String , dept name ∈ String , and salary ∈ Number +

Database Management Systems Partha Pratim Das 06.10

Relations are Unordered with Unique Tuples

Module 06 • Order of tuples / rows is irrelevant (tuples may be stored in an arbitrary order)
Partha Pratim
Das • No two tuples / rows may be identical
Week Recap
• Example: instructor relation with unordered tuples
Objectives &
Outline

Example of a
Relation

Attributes

Schema and
Instance

Keys

Relational Query
Languages

Module Summary

Database Management Systems Partha Pratim Das 06.11

Keys PPD

Module 06

Partha Pratim
Das

Week Recap

Objectives &
Outline

Example of a
Relation

Attributes

Schema and
Instance

Keys

Relational Query
Languages

Module Summary
Keys

Database Management Systems Partha Pratim Das 06.12

Keys PPD

Module 06

Partha Pratim • Let K ⊆ R, where R is the set of attributes in the relation

Das
• K is a superkey of R if values for K are sufficient to identify a unique tuple of each
Week Recap
possible relation r (R)
Objectives &
Outline ◦ Example: {ID} and {ID, name} are both superkeys of instructor
Example of a
Relation • Superkey K is a candidate key if K is minimal
Attributes
◦ Example: {ID} is a candidate key for instructor
Schema and
Instance • One of the candidate keys is selected to be the primary key
Keys
◦ Which one?
Relational Query
Languages
• A surrogate key (or synthetic key) in a database is a unique identifier for either an
Module Summary
entity in the modeled world or an object in the database
◦ The surrogate key is not derived from application data, unlike a natural (or
business) key which is derived from application data

Database Management Systems Partha Pratim Das 06.13

Keys PPD

Module 06
• Students = Roll#, First Name, Last Name, DoB, Passport#, Aadhaar #, Department
Partha Pratim
Das • Super Key: Roll #, {Roll #, DoB}
Week Recap • Candidate Keys: Roll #, {First Name, Last Name}, Aadhaar#
Objectives &
Outline
◦ Passport # cannot be a key. Why?
Example of a ◦ Null values are allowed for Passport # (a student may not have a passport)
Relation

Attributes
• Primary Key: Roll #
Schema and ◦ Can Aadhaar# be a key?
Instance
◦ It may suffice for unique identification. But Roll# may have additional useful
Keys

Database Management Systems Partha Pratim Das 06.20

Relational Query Languages PPD

Module 06

Partha Pratim • “Pure” languages:

Das
◦ Relational algebra
Week Recap
◦ Tuple relational calculus
Objectives &
Outline ◦ Domain relational calculus
Example of a
Relation
• The above 3 pure languages are equivalent in computing power
Attributes • We will concentrate on relational algebra
Schema and
Instance ◦ Not Turing-machine equivalent
Keys . Not all algorithms can be expressed in RA
Relational Query
Languages ◦ Consists of 6 basic operations
Module Summary

Database Management Systems Partha Pratim Das 06.21

Module Summary PPD

Module 06

Partha Pratim • Introduced the notion of attributes and their types

Das
• Taken an overview of the mathematical structure of relational model – schema and
Week Recap
instance
Objectives &
Outline
• Introduced the notion of keys – primary as well as foreign
Example of a
Relation

Attributes

Schema and
Instance

Keys

Relational Query
Languages

Module Summary

Slides used in this presentation are borrowed from [Link] with kind
permission of the authors.
Edited and new slides are marked with “PPD”.
Database Management Systems Partha Pratim Das 06.22
Module 07

Partha Pratim
Das

Objectives &
Outline Database Management Systems
Relational
Operators Module 07: Introduction to Relational Model/2
Aggregation
Operators

Module Summary

Partha Pratim Das

Department of Computer Science and Engineering

Indian Institute of Technology, Kharagpur

ppd@[Link]

Database Management Systems Partha Pratim Das 07.1

Module Recap PPD

Module 07

Partha Pratim • Basic notions of modeling introduced

Das
◦ Attributes and their Types
Objectives &
Outline ◦ Schema and Instance
Relational ◦ Keys and their Categorization
Operators

Aggregation
• Languages for Relation Model introduced
Operators

Module Summary

Database Management Systems Partha Pratim Das 07.2

Module Objectives PPD

Module 07

Partha Pratim • To understand relational algebra

Das
• To familiarize with the operators of relational algebra
Objectives &
Outline

Relational
Operators

Aggregation
Operators

Module Summary

Database Management Systems Partha Pratim Das 07.3

Module Outline PPD

Module 07

Partha Pratim • Operations

Das
◦ Select
Objectives &
Outline ◦ Project
Relational ◦ Union
Operators
◦ Difference
Aggregation
Operators ◦ Intersection
Module Summary ◦ Cartesian Product
◦ Natural Join
• Aggregate Operations

Database Management Systems Partha Pratim Das 07.4

Relational Operators PPD

Module 07

Partha Pratim
Das

Objectives &
Outline

Relational
Operators

Aggregation
Operators

Module Summary

Relational Operators

Database Management Systems Partha Pratim Das 07.5

Basic Properties of Relations

Module 07 • A relation is set. Hence,

Partha Pratim
Das • Ordering of rows / tuples is inconsequential
Objectives &
Outline A B A B
Relational
Operators
a1 b1 a1 b1
Aggregation a1 b2 is same as: a2 b1
Operators
a2 b1 a2 b2
Module Summary
a2 b2 a1 b2
• All rows / tuples must be distinct

Database Management Systems Partha Pratim Das 07.11

Joining two relations – Cartesian-product

Module 07

Partha Pratim • Relation r , s

Das

Objectives &
Outline

Relational
Operators

Aggregation
Operators

Module Summary

• r ×s

Database Management Systems Partha Pratim Das 07.12

Cartesian-product – naming issue

Module 07

Partha Pratim • Relation r , s

Das

Objectives &
Outline

Relational
Operators

Aggregation
Operators

Module Summary • r ×s

Database Management Systems Partha Pratim Das 07.13

Renaming a Table

Module 07

Partha Pratim • Allows us to refer to a relation, (say E ) by more than one name.
Das

Objectives & ρX (E )
Outline

Relational
Operators returns the expression E under the name X
Aggregation
Operators
• Relations r
Module Summary

• r × ρs (r )

Database Management Systems Partha Pratim Das 07.14

Composition of Operations

Module 07

Partha Pratim • Can build expressions using multiple operations

Das
• Example: σA=C (r × s)
Objectives &
Outline • r ×s
Relational
Operators

Aggregation
Operators

Module Summary

• σA=C (r × s)

Database Management Systems Partha Pratim Das 07.15

Objectives &
Outline

Relational
Operators

Aggregation
Operators

Module Summary

Database Management Systems Partha Pratim Das 07.21

Module Summary PPD

Module 07

Partha Pratim • Introduced relational algebra

Das
• Familiarized with the operators of relational algebra
Objectives &
Outline

Relational
Operators

Aggregation
Operators

Module Summary

Slides used in this presentation are borrowed from [Link] with kind
permission of the authors.
Edited and new slides are marked with “PPD”.

Das
• Data Definition Language (DDL)
Objectives &
Outline • Data Manipulation Language (DML): Query Structure
Outline

History of SQL

Data Definition
Language (DDL)
Create Table
Integrity Constraints
Update Table

Data
Manipulation
Language (DML):
Query Structure
Select Clause
Where Clause
From Clause

Module Summary

Database Management Systems Partha Pratim Das 08.4

History of SQL PPD

Module 08

Partha Pratim
Das

Objectives &
Outline

Outline

History of SQL

Data Definition
Language (DDL)
Create Table
Integrity Constraints
Update Table

Data
Manipulation
Language (DML):
Query Structure
History of SQL
Select Clause
Where Clause
From Clause

Module Summary

Database Management Systems Partha Pratim Das 08.5

History of Query Language PPD

Module 08

Partha Pratim • IBM developed Structured English Query Language (SEQUEL) as part of System R
Das
project. Renamed Structured Query Language (SQL: pronounced still as SEQUEL)
Objectives &
Outline • ANSI and ISO standard SQL:
Outline
SQL-86 First formalized by ANSI
History of SQL SQL-89 + Integrity Constraints
Data Definition SQL-92 Major revision (ISO/IEC 9075 standard), De-facto Industry Standard
Language (DDL) SQL:1999 + Regular Expression Matching, Recursive Queries, Triggers, Support for Procedural and
Create Table
Control Flow Statements, Nonscalar types (Arrays), and Some OO features (structured
Integrity Constraints
types), Embedding SQL in Java (SQL/OLB), and Embedding Java in SQL (SQL/JRT)
Update Table
SQL:2003 + XML features (SQL/XML), Window Functions, Standardized Sequences, and Columns
Data with Auto-generated Values (identity columns)
Manipulation
Language (DML): SQL:2006 + Ways of importing and storing XML data in an SQL database, manipulating it within
Query Structure the database, and publishing both XML and conventional SQL-data in XML form
Select Clause SQL:2008 Legalizes ORDER BY outside Cursor Definitions
Where Clause + INSTEAD OF Triggers, TRUNCATE Statement, and FETCH Clause
From Clause
SQL:2011 + Temporal Data (PERIOD FOR)
Module Summary Enhancements for Window Functions and FETCH Clause
SQL:2016 + Row Pattern Matching, Polymorphic Table Functions, and JSON
SQL:2019 + Multidimensional Arrays (MDarray type and operators)

Database Management Systems Partha Pratim Das 08.6

History of Query Language (2): Compliance PPD

Module 08

Partha Pratim • SQL is the de facto industry standard today for relational or structred data systems
Das
• Commercial systems as well as open systems may be fully or partially compliant to one
Objectives &
Outline or more standards from SQL-92 onward
Outline
• Not all examples here may work on your particular system. Check your system’s SQL
History of SQL
documentation
Data Definition
Language (DDL)
Create Table
Integrity Constraints
Update Table

Data
Manipulation
Language (DML):
Query Structure
Select Clause
Where Clause
From Clause

Module Summary

Database Management Systems Partha Pratim Das 08.7

History of Query Language (3): Alternatives PPD

Module 08

Partha Pratim • There aren’t any alternatives to SQL for speaking to relational databases (that is, SQL
Das
as a protocol), but there are many alternatives to writing SQL in the applications
Objectives &
Outline • These alternatives have been implemented in the form of frontends for working with
Outline relational databases. Some examples of a frontend include (for a section of languages):
History of SQL
◦ SchemeQL and CLSQL, which are probably the most flexible, owing to their Lisp
Data Definition
Language (DDL) heritage, but they also look like a lot more like SQL than other frontends
Create Table
Integrity Constraints
◦ LINQ (in .Net)
Update Table ◦ ScalaQL and ScalaQuery (in Scala)
Data
Manipulation
◦ SqlStatement, ActiveRecord and many others in Ruby
Language (DML):
Query Structure
◦ HaskellDB
Select Clause ◦ ...the list goes on for many other languages.
Where Clause
From Clause

Module Summary

Source: What are good alternatives to SQL (the language)?

Database Management Systems Partha Pratim Das 08.8

History of Query Language (4): Derivatives PPD

Module 08

Partha Pratim • There are several query languages that are derived from or inspired by SQL. Of these,
Das
the most popular and effective is SPARQL.
Objectives &
Outline ◦ SPARQL (pronounced sparkle, a recursive acronym for SPARQL Protocol and RDF
Outline Query Language) is an RDF query language
History of SQL . A semantic query language for databases - able to retrieve and manipulate data
Data Definition
Language (DDL) stored in Resource Description Framework (RDF) format.
Create Table . It has been standardized by the W3C Consortium as key technology of the
Integrity Constraints
Update Table semantic web
Data . Versions:
Manipulation
Language (DML):
Query Structure
− SPARQL 1.0 (January 2008)
Select Clause − SPARQL 1.1 (March, 2013)
Where Clause
From Clause . Used as the query languages for several NoSQL systems - particularly the Graph
Module Summary Databases that use RDF as store

Database Management Systems Partha Pratim Das 08.9

Data Definition Language (DDL) PPD

Module 08

Partha Pratim
Das

Objectives &
Outline

Outline

History of SQL

Data Definition
Language (DDL)
Create Table
Integrity Constraints
Update Table

Data
Manipulation
Language (DML):
Query Structure
Data Definition Language (DDL)
Select Clause
Where Clause
From Clause

Module Summary

Database Management Systems Partha Pratim Das 08.10

Data Definition Language (DDL)

Module 08

Partha Pratim
The SQL data-definition language (DDL) allows the specification of information about
Das relations, including:
Objectives &
Outline
• The Schema for each Relation
Outline • The Domain of values associated with each Attribute
History of SQL
• Integrity Constraints
Data Definition
Language (DDL)
• And, as we will see later, also other information such as
Create Table
Integrity Constraints ◦ The set of Indices to be maintained for each relations
Update Table

Data
◦ Security and Authorization information for each relation
Manipulation
Language (DML):
◦ The Physical Storage Structure of each relation on disk
Query Structure
Select Clause
Where Clause
From Clause

Module Summary

Database Management Systems Partha Pratim Das 08.11

Domain Types in SQL

Module 08

Partha Pratim • char(n). Fixed length character string, with user-specified length n
Das
• varchar(n). Variable length character strings, with user-specified maximum length n
Objectives &
Outline • int. Integer (a finite subset of the integers that is machine-dependent)
Outline

History of SQL
• smallint(n). Small integer (a machine-dependent subset of the integer domain type)
Data Definition • numeric(p, d). Fixed point number, with user-specified precision of p digits, with d
Language (DDL)
Create Table
digits to the right of decimal point. (ex., numeric(3, 1), allows 44.5 to be stores
Integrity Constraints
Update Table
exactly, but not 444.5 or 0.32)
Data • real, double precision. Floating point and double-precision floating point numbers,
Manipulation
Language (DML): with machine-dependent precision
Query Structure
Select Clause • float(n). Floating point number, with user-specified precision of at least n digits
Where Clause
From Clause • More are covered in Chapter 4
Module Summary

Database Management Systems Partha Pratim Das 08.12

Schema Diagram for University Database PPD

Module 08

Partha Pratim
Das

Objectives &
Outline

Outline

History of SQL

Data Definition
Language (DDL)
Create Table
Integrity Constraints
Update Table

Data
Manipulation
Language (DML):
Query Structure
Select Clause
Where Clause
From Clause

Module Summary

Database Management Systems Partha Pratim Das 08.13

Create Table Construct PPD

Module 08
• An SQL relation is defined using the create table command:
Partha Pratim
Das create table r (A1 D1 , A2 D2 , . . . , An Dn ),
Objectives &
(integrity -constraint1 ),
Outline
...
Outline
(integrity -constraintk ));
History of SQL

Data Definition
◦ r is the name of the relation
Language (DDL) ◦ each Ai is an attribute name in the schema of relation r
Create Table
Integrity Constraints ◦ Di is the data type of values in the domain of attribute Ai
Update Table

Data
Manipulation
Language (DML):
Query Structure
Select Clause
Where Clause
From Clause

Module Summary

Database Management Systems Partha Pratim Das 08.14

Create Table Construct (2) PPD

Module 08

Partha Pratim
Das

Objectives &
Outline

Outline create table instructor (

History of SQL ID char(5),
Data Definition
Language (DDL) name varchar(20)
Create Table
Integrity Constraints dept name varchar(20)
Update Table

Data salary numeric(8, 2));

Manipulation
Language (DML):
Query Structure
Select Clause
Where Clause
From Clause

Module Summary

Database Management Systems Partha Pratim Das 08.15

Create Table Construct (3): Integrity Constraints PPD

Module 08
• not null
Partha Pratim
Das • primary key (A1 , . . . , An )
Objectives &
Outline
• foreign key (Am , . . . , An ) references r
Outline
create table instructor ( create table instructor (
History of SQL

Data Definition ID char(5), ID char(5),

Language (DDL)
Create Table name varchar(20) name varchar(20) not null,
Integrity Constraints
Update Table
dept name varchar(20) dept name varchar(20),
Data
Manipulation salary numeric(8, 2)); salary numeric(8, 2),
Language (DML):
Query Structure primary key (ID),
Select Clause
Where Clause
foreign key (dept name) references department));
From Clause

Module Summary
primary key declaration on an attribute automatically ensures not null

Database Management Systems Partha Pratim Das 08.16

University Schema PPD

Module 08

Partha Pratim
Das

Objectives &
Outline

Outline

History of SQL

Data Definition
Language (DDL)
Create Table
Integrity Constraints
Update Table

Data
Manipulation
Language (DML):
Query Structure
Select Clause
Where Clause
From Clause

Module Summary

Database Management Systems Partha Pratim Das 08.17

Create Table Construct (4): More Relations PPD

create table student ( create table takes (

Module 08
ID varchar(5), ID varchar(5),
Partha Pratim
Das
name varchar(20) not null, course id varchar(8), sec id varchar(8),
Objectives & dept name varchar(20), semester varchar(6), year numeric(4, 0),
Outline

Outline tot cred numeric(3, 0), grade varchar(2),

History of SQL primary key (ID), primary key (ID, course id, sec id, semester, year ),
Data Definition
Language (DDL)
foreign key (dept name) foreign key (ID) references student
Create Table references department);
Integrity Constraints
foreign key (course id, sec id, semester, year )
Update Table
create table course ( references section);
Data course id varchar(8), • Note: sec id can be dropped from primary key above,
Manipulation
Language (DML): title varchar(50), to ensure a student cannot be registered for two
Query Structure sections of the same course in the same semester
Select Clause dept name varchar(20),
Where Clause
From Clause credits numeric(2, 0),
Module Summary primary key (course id),
foreign key (dept name)
references department);
Database Management Systems Partha Pratim Das 08.18
Update Tables

Module 08 • Insert (DML command)

Partha Pratim
Das
◦ insert into instructor values (‘10211’, ‘Smith’, ‘Biology’, 66000);
Objectives &
• Delete (DML command)
Outline
◦ Remove all tuples from the student relation
Outline
delete from student
History of SQL

Data Definition
• Drop Table (DDL command)
Language (DDL)
Create Table
◦ drop table r
Integrity Constraints
Update Table
• Alter (DDL command)
Data ◦ alter table r add A D
Manipulation
Language (DML): . Where A is the name of the attribute to be added to relation r and D is the domain of A
Query Structure . All existing tuples in the relation are assigned null as the value for the new attribute
Select Clause
Where Clause ◦ alter table r drop A
From Clause
. Where A is the name of an attribute of relation r
Module Summary . Dropping of attributes not supported by many databases

Database Management Systems Partha Pratim Das 08.19

Data Manipulation Language (DML): Query Structure PPD

Module 08

Partha Pratim
Das

Objectives &
Outline

Outline

History of SQL

Data Definition
Language (DDL)
Create Table
Integrity Constraints
Update Table

Data
Manipulation
Language (DML):
Query Structure
Data Manipulation Language (DML):
Select Clause
Where Clause
From Clause
Query Structure
Module Summary

Database Management Systems Partha Pratim Das 08.20

Basic Query Structure

Module 08
• A typical SQL query has the form:
Partha Pratim
Das select A1 , A2 , . . . , An ,
Objectives &
from r1 , r2 , ..., rm
Outline where P
Outline
◦ Ai represents an attribute from ri ’s
History of SQL
◦ ri represents a relation
Data Definition
Language (DDL) ◦ P is a predicate
Create Table
Integrity Constraints • The result of an SQL query is a relation
Update Table

Data
Manipulation
Language (DML):
Query Structure
Select Clause
Where Clause
From Clause

Module Summary

Database Management Systems Partha Pratim Das 08.21

Select Clause

Module 08
• The select clause lists the attributes desired in the result of a query
Partha Pratim
Das ◦ Corresponds to the projection operation of the relational algebra
Objectives & • Example: find the names of all instructors:
Outline

Outline
select name,
History of SQL
from instructor
Data Definition • NOTE: SQL names are case insensitive (that is, you may use upper-case or lower-case
Language (DDL)
Create Table letters)
Integrity Constraints
Update Table ◦ Name ≡ NAME ≡ name
Data ◦ Some people use upper case wherever we use bold font
Manipulation
Language (DML):
Query Structure
Select Clause
Where Clause
From Clause

Module Summary

Database Management Systems Partha Pratim Das 08.22

Select Clause (2)

Module 08
• SQL allows duplicates in relations as well as in query results!!!
Partha Pratim
Das • To force the elimination of duplicates, insert the keyword distinct after select
Objectives &
Outline
• Find the department names of all instructors, and remove duplicates
Outline
select distinct dept name
History of SQL
from instructor
Data Definition
Language (DDL)
• The keyword all specifies that duplicates should not be removed
Create Table select all dept name
Integrity Constraints
Update Table
from instructor
Data
Manipulation
Language (DML):
Query Structure
Select Clause
Where Clause
From Clause

Module Summary

Database Management Systems Partha Pratim Das 08.23

Select Clause (3)

Module 08
• An asterisk in the select clause denotes all attributes
Partha Pratim
Das select *
Objectives &
from instructor
Outline
• An attribute can be a literal with no from clause
Outline
select ’437’
History of SQL

Data Definition
◦ Results is a table with one column and a single row with value ’437’
Language (DDL)
Create Table
◦ Can give the column a name using:
Integrity Constraints select ’437’ as FOO
Update Table

Data • An attribute can be a literal with from clause

Manipulation
Language (DML): select ’A’
Query Structure
Select Clause
from instructor
Where Clause
From Clause
◦ Result is a table with one column and N rows (number of tuples in the instructors
Module Summary
table), each row with value ’A’

Database Management Systems Partha Pratim Das 08.24

Select Clause (4)

Module 08 The select clause can contain arithmetic expressions involving the operation, +, –, *, and
Partha Pratim /, and operating on constants or attributes of tuples
Das

Objectives &
• The query:
Outline select ID, name, salary/12
Outline from instructor
History of SQL

Data Definition
• Would return a relation that is the same as the instructor relation, except that the
Language (DDL) value of the attribute salary is divided by 12
Create Table
Integrity Constraints • Can rename “salary /12” using the as clause:
Update Table

Data
select ID, name, salary/12 as monthly salary
Manipulation
Language (DML):
Query Structure
Select Clause
Where Clause
From Clause

Module Summary

Database Management Systems Partha Pratim Das 08.25

Where Clause

Module 08
• The where clause specifies conditions that the result must satisfy
Partha Pratim
Das ◦ Corresponds to the selection predicate of the relational algebra
Objectives & • To find all instructors in Comp. Sci. dept
Outline

Outline
select name
History of SQL
from instructor
Data Definition
where dept name = ’Comp. Sci.’
Language (DDL)
Create Table • Comparison results can be combined using the logical connectives and, or, and not
Integrity Constraints
Update Table ◦ To find all instructors in Comp. Sci. dept with salary > 80000
Data select name
Manipulation
Language (DML): from instructor
Query Structure
Select Clause
where dept name = ’Comp. Sci.’ and salary > 80000
Where Clause
From Clause • Comparisons can be applied to results of arithmetic expressions
Module Summary

Database Management Systems Partha Pratim Das 08.26

From Clause

Module 08
• The from clause lists the relations involved in the query
Partha Pratim
Das ◦ Corresponds to the Cartesian product operation of the relational algebra
Objectives & • Find the Cartesian product instructor X teaches
Outline

Outline
select *
History of SQL
from instructor , teaches
Data Definition ◦ Generates every possible instructor-teaches pair, with all attributes from both
Language (DDL)
Create Table
relations
Integrity Constraints
Update Table
◦ For common attributes (for example, ID), the attributes in the resulting table are
Data
renamed using the relation name (for example, [Link])
Manipulation
Language (DML): • Cartesian product not very useful directly, but useful combined with where-clause
Query Structure
Select Clause
condition (selection operation in relational algebra)
Where Clause
From Clause

Module Summary

Database Management Systems Partha Pratim Das 08.27

Cartesian Product

Module 08

Partha Pratim
Das

Objectives &
Outline

Outline

History of SQL

Data Definition
Language (DDL)
Create Table
Integrity Constraints
Update Table

Data
Manipulation
Language (DML):
Query Structure
Select Clause
Where Clause
From Clause

Module Summary

Database Management Systems Partha Pratim Das 08.28

Module Summary PPD

Module 08

Partha Pratim • Introduced relational query language

Das
• Familiarized with data definition and basic query structure
Objectives &
Outline

Outline

History of SQL

Objectives &
Outline

Additional Basic
Operations
Cartesian Product
Rename AS
Operation
String Values
Order By Clause
Select Top / Fetch
Clause
Where Clause
Predicates
Duplicates

Module Summary

Cartesian Product Example

Module 09

Partha Pratim • Relation emp super

Das

Objectives &
Outline

Additional Basic
Operations
Cartesian Product
Rename AS
Operation
String Values
Order By Clause
• Find the supervisor of “Bob”
Select Top / Fetch
Clause • Find the supervisor of the supervisor of “Bob”
Where Clause
Predicates • Find ALL the supervisors (direct and indirect) of “Bob”
Duplicates

Module Summary

Database Management Systems Partha Pratim Das 09.11

String Operations

Database Management Systems Partha Pratim Das 09.18

Duplicates (2)

Module 09

Partha Pratim • Example: Suppose multiset relations r1 (A, B) and r2 (C ) are as follows:
Das
r1 = {(1, a)(2, a)} r 2 = {(2), (3), (3)}
Objectives &
Outline • Then ΠB (r1 ) would be {(a), (a)}, while ΠB (r1 ) x r2 would be
Additional Basic {(a, 2), (a, 2), (a, 3), (a, 3), (a, 3), (a, 3)}
Operations
Cartesian Product • SQL duplicate semantics:
Rename AS
Operation select A1 , A2 , . . . , An
String Values
Order By Clause from r1 , r2 , . . . , rm
Select Top / Fetch
Clause where P
Where Clause
Predicates is equivalent to the multiset version of the expression:
Duplicates

Module Summary ΠA1 ,A2 ,...,An (σP (r1 × r2 × . . . × rm ))

Database Management Systems Partha Pratim Das 09.19

Module Summary PPD

Module 09

Partha Pratim • Completed the understanding of basic query structure

Das

Objectives &
Outline

Additional Basic
Operations
Cartesian Product
Rename AS
Operation
String Values
Order By Clause
Select Top / Fetch
Clause
Where Clause
Predicates
Slides used in this presentation are borrowed from [Link] with kind
Duplicates permission of the authors.
Module Summary Edited and new slides are marked with “PPD”.

Database Management Systems Partha Pratim Das 09.20

Module 10

Partha Pratim
Das

Objectives &
Outline Database Management Systems
Set Operations
Module 10: Introduction to SQL/3
Null Values
Three Valued Logic

Aggregate
Functions
Group By
Having
Partha Pratim Das
Null Values

Module Summary Department of Computer Science and Engineering

Indian Institute of Technology, Kharagpur

ppd@[Link]

Database Management Systems Partha Pratim Das 10.1

Module Recap PPD

Module 10

Partha Pratim • Completed the understanding of basic query structure

Das

Objectives &
Outline

Set Operations

Null Values
Three Valued Logic

Aggregate
Functions
Group By
Having
Null Values

Module Summary

Database Management Systems Partha Pratim Das 10.2

Module Objectives PPD

Module 10

Partha Pratim • To familiarize with set operations, null values and aggregation
Das

Objectives &
Outline

Set Operations

Null Values
Three Valued Logic

Aggregate
Functions
Group By
Having
Null Values

Module Summary

Database Management Systems Partha Pratim Das 10.3

Module Outline PPD

Module 10

Partha Pratim • Set Operations: union, intersect, except

Das
• Null Values
Objectives &
Outline • Aggregate Functions: avg, min, max, sum, and count
Set Operations
◦ Group By
Null Values
Three Valued Logic ◦ Having
Aggregate ◦ Null Values
Functions
Group By
Having
Null Values

Module Summary

Database Management Systems Partha Pratim Das 10.4

Set Operations PPD

Module 10

Partha Pratim
Das

Objectives &
Outline

Set Operations

Null Values
Three Valued Logic

Aggregate
Functions
Group By
Having
Null Values

Module Summary
Set Operations

Database Management Systems Partha Pratim Das 10.5

Set Operations

Module 10

Partha Pratim • Find courses that ran in Fall 2009 or in Spring 2010
Das
(select course id from section where sem = ’Fall’ and year = 2009)
Objectives &
Outline
union
Set Operations
(select course id from section where sem = ’Spring’ and year = 2010)
Null Values • Find courses that ran in Fall 2009 and in Spring 2010
Three Valued Logic

Aggregate
(select course id from section where sem = ’Fall’ and year = 2009)
Functions intersect
Group By
Having (select course id from section where sem = ’Spring’ and year = 2010)
Null Values

Module Summary
• Find courses that ran in Fall 2009 but not in Spring 2010
(select course id from section where sem = ’Fall’ and year = 2009)
except
(select course id from section where sem = ’Spring’ and year = 2010)

Database Management Systems Partha Pratim Das 10.6

Set Operations (2)

Module 10

Partha Pratim • Find the salaries of all instructors that are less than the largest salary
Das
select distinct T .salary
Objectives &
Outline
from instructor as T, instructor as S
Set Operations
where T .salary < [Link]
Null Values • Find all the salaries of all instructors
Three Valued Logic

Aggregate
select distinct salary
Functions from instructor
Group By
Having • Find the largest salary of all instructors
Null Values

Module Summary
(select “second query” )
except
(select “first query”)

Database Management Systems Partha Pratim Das 10.7

Set Operations (3)

Module 10

Partha Pratim • Set operations union, intersect, and except

Das
◦ Each of the above operations automatically eliminates duplicates
Objectives &
Outline • To retain all duplicates use the corresponding multiset versions union all, intersect all,
Set Operations and except all.
Null Values
Three Valued Logic • Suppose a tuple occurs m times in r and n times in s, then, it occurs:
Aggregate
Functions
◦ m + n times in r union all s
Group By ◦ min(m, n) times in r intersect all s
Having
Null Values
◦ max(0, m − n) times in r except all s
Module Summary

Database Management Systems Partha Pratim Das 10.8

Null Values PPD

Module 10

Partha Pratim
Das

Objectives &
Outline

Set Operations

Null Values
Three Valued Logic

Aggregate
Functions
Group By
Having
Null Values

Module Summary
Null Values

Database Management Systems Partha Pratim Das 10.9

Null Values

Module 10

Partha Pratim • It is possible for tuples to have a null value, denoted by null, for some of their attributes
Das
• null signifies an unknown value or that a value does not exist
Objectives &
Outline • The result of any arithmetic expression involving null is null
Set Operations
◦ Example: 5 + null returns null
Null Values
Three Valued Logic • The predicate is null can be used to check for null values
Aggregate
Functions ◦ Example: Find all instructors whose salary is null
Group By
Having
select name
Null Values from instructor
Module Summary where salary is null
• It is not possible to test for null values with comparison operators, such as =, <, or <>
We need to use the is null and is not null operators instead

Database Management Systems Partha Pratim Das 10.10

Null Values (2): Three Valued Logic

Module 10

Partha Pratim • Three values – true, false, unknown

Das
• Any comparison with null returns unknown
Objectives &
Outline ◦ Example: 5 < null or null <> null or null = null
Set Operations
• Three-valued logic using the value unknown:
Null Values
Three Valued Logic ◦ OR: (unknown or true) = true,
Aggregate
Functions
(unknown or false) = unknown
Group By (unknown or unknown) = unknown
Having
Null Values
◦ AND: (true and unknown) = unknown,
Module Summary (false and unknown) = false,
(unknown and unknown) = unknown
◦ NOT: (not unknown) = unknown
◦ “P is unknown“ evaluates to true if predicate P evaluates to unknown
• Result of where clause predicate is treated as false if it evaluates to unknown

Database Management Systems Partha Pratim Das 10.11

Aggregate Functions PPD

Module 10

Partha Pratim
Das

Objectives &
Outline

Set Operations

Null Values
Three Valued Logic

Aggregate
Functions
Group By
Having
Null Values

Module Summary
Aggregate Functions

Database Management Systems Partha Pratim Das 10.12

Aggregate Functions

Module 10

Partha Pratim • These functions operate on the multiset of values of a column of a relation, and return
Das
a value
Objectives &
Outline avg: average value
Set Operations min: minimum value
Null Values max: maximum value
Three Valued Logic
sum: sum of values
Aggregate
Functions count: number of values
Group By
Having
Null Values

Module Summary

Database Management Systems Partha Pratim Das 10.13

Aggregate Functions (2)

Module 10

Partha Pratim • Find the average salary of instructors in the Computer Science department
Das
select avg (salary )
Objectives &
Outline
from instructor
Set Operations
where dept name = ’Comp. Sci’;
Null Values • Find the total number of instructors who teach a course in the Spring 2010 semester
Three Valued Logic

Aggregate
select count (distinct ID)
Functions from teaches
Group By
Having where semester = ’Spring’ and year = 2010;
Null Values

Module Summary
• Find the number of tuples in the course relation
select count (*)
from courses;

Database Management Systems Partha Pratim Das 10.14

Aggregate Functions (3): Group By

Module 10

Partha Pratim
• Find the average salary of instructors in each department
Das
select dept name, avg(salary ) as avg salary
Objectives & from instructor
Outline
group by dept name;
Set Operations

Null Values
Three Valued Logic

Aggregate
Functions
Group By
Having
Null Values

Module Summary

Database Management Systems Partha Pratim Das 10.15

Aggregate Functions (4): Group By

Module 10

Partha Pratim • Attributes in select clause outside of aggregate functions must appear in group by list
Das
/* erroneous query */
Objectives &
Outline
select dept name, ID, avg(salary )
Set Operations
from instructor
Null Values group by dept name;
Three Valued Logic

Aggregate
Functions
Group By
Having
Null Values

Module Summary

Database Management Systems Partha Pratim Das 10.16

Aggregate Functions (5): Having Clause

Module 10

Partha Pratim • Find the names and average salaries of all departments whose average salary is greater
Das
than 42000
Objectives &
Outline
select dept name, avg(salary )
Set Operations
from instructor
Null Values group by dept name
Three Valued Logic
having avg(salary ) > 42000;
Aggregate
Functions Note: predicates in the having clause are applied after the formation of groups whereas
Group By
Having predicates in the where clause are applied before forming groups
Null Values

Module Summary

Database Management Systems Partha Pratim Das 10.17

Null Values and Aggregates

Module 10

Partha Pratim • Total all salaries

Das
select sum (salary )
Objectives &
Outline
from instructor ;
Set Operations ◦ Above statement ignores null amounts
Null Values ◦ Result is null if there is no non-null amount
Three Valued Logic

Aggregate
• All aggregate operations except count(*) ignore tuples with null values on the
Functions
Group By
aggregated attributes
Having
Null Values
• What if collection has only null values?
Module Summary ◦ count returns 0
◦ all other aggregates return null

Database Management Systems Partha Pratim Das 10.18

Module Summary PPD

Module 10

Partha Pratim • Completed the understanding of set operations, null values, and aggregation
Das

Objectives &
Outline

Set Operations

Null Values
Three Valued Logic

Aggregate
Functions
Group By
Having
Null Values
Slides used in this presentation are borrowed from [Link] with kind
Module Summary
permission of the authors.
Edited and new slides are marked with “PPD”.

Database Management Systems Partha Pratim Das 10.19

Module 11

Partha Pratim
Das

Week Recap

Objectives &
Database Management Systems
Outline
Module 11: SQL Examples
SQL Examples
SELECT
Cartesian Product /
AS
WHERE: AND / OR
String Partha Pratim Das
ORDER BY
IN
Set Department of Computer Science and Engineering
UNION Indian Institute of Technology, Kharagpur
INTERSECT
EXCEPT ppd@[Link]
Aggregation
AVG
MIN
MAX
COUNT
SUM

Module Summary Database Management Systems Partha Pratim Das 11.1

Week Recap PPD

Module 11

Partha Pratim • Basic notions of Relational Database Models

Das
◦ Attributes and their types
Week Recap
◦ Mathematical structure of relational model
Objectives &
Outline ◦ Schema and Instance
SQL Examples ◦ Keys, primary as well as foreign
SELECT
Cartesian Product /
AS
• Relational algebra with operators
WHERE: AND / OR
String
• Relational query language
ORDER BY
IN
◦ DDL (Data Definition)
Set ◦ DML (Basic Query Structure)
UNION
INTERSECT • Detailed understanding of basic query structure
EXCEPT
Aggregation • Set operations, null values, and aggregation
AVG
MIN
MAX
COUNT
SUM

Module Summary Database Management Systems Partha Pratim Das 11.2

Module Objectives PPD

Module 11

Partha Pratim • To recap various basic SQL features through example workout
Das

Week Recap

Objectives &
Outline

SQL Examples
SELECT
Cartesian Product /
AS
WHERE: AND / OR
String
ORDER BY
IN
Set
UNION
INTERSECT
EXCEPT
Aggregation
AVG
MIN
MAX
COUNT
SUM

Module Summary Database Management Systems Partha Pratim Das 11.3

Module Outline PPD

Module 11

Partha Pratim • Examples of basic SQL

Das

Week Recap

Objectives &
Outline

SQL Examples
SELECT
Cartesian Product /
AS
WHERE: AND / OR
String
ORDER BY
IN
Set
UNION
INTERSECT
EXCEPT
Aggregation
AVG
MIN
MAX
COUNT
SUM

Module Summary Database Management Systems Partha Pratim Das 11.4

Select distinct PPD

Module 11

Partha Pratim • From the classroom relation in the figure, find the names of buildings in which every
Das
individual classroom has capacity less than 100 (removing the duplicates).
Week Recap

Objectives & ◦ Query:

Outline

SQL Examples
select distinct building
SELECT from classroom
Cartesian Product /
AS where capacity < 100;
WHERE: AND / OR
String ◦ Output :
ORDER BY
IN building
Set
Painter
UNION Figure: classroom relation
INTERSECT
Taylor
EXCEPT Watson
Aggregation
AVG
MIN
MAX
COUNT
SUM

Module Summary Database Management Systems Partha Pratim Das 11.5

Select all PPD

Module 11

Partha Pratim • From the classroom relation in the figure, find the names of buildings in which every
Das
individual classroom has capacity less than 100 (without removing the duplicates).
Week Recap

Objectives & ◦ Query:

Outline

SQL Examples
select all building
SELECT from classroom
Cartesian Product /
AS where capacity < 100;
WHERE: AND / OR
String ◦ Output:
ORDER BY
IN
building
Set Painter
UNION Figure: classroom relation Taylor
INTERSECT
Watson
EXCEPT
Watson
Aggregation
AVG
MIN • Note that duplicate retention is the default and hence it is a common practice to skip
MAX
COUNT
all immediately after select.
SUM

Module Summary Database Management Systems Partha Pratim Das 11.6

Cartesian Product PPD

Module 11
• Find the list of all students of departments which have a
Partha Pratim
Das budget < $0.1million
Week Recap
select name, budget
Objectives & from student, department name budget
Outline
where [Link] name = [Link] name and Brandt 50000.00
SQL Examples Peltier 70000.00
SELECT
budget < 100000;
Cartesian Product /
Levy 70000.00
AS • The above query first generates every possible student- Sanchez 80000.00
WHERE: AND / OR
String
department pair, which is the Cartesian product of stu- Snow 70000.00
ORDER BY
dent and department. Then, it filters all the rows with Aoi 85000.00
IN
Bourikas 85000.00
Set [Link] name = [Link] name and budget <
UNION
Tanaka 90000.00
INTERSECT
100000.
EXCEPT
Aggregation
• The common attribute dept name in the resulting table are
AVG
renamed using the relation name - [Link] name and
MIN
MAX [Link] name)
COUNT
SUM

Module Summary Database Management Systems Partha Pratim Das 11.7

Rename AS Operation PPD

Module 11
• The same query in the previous slide can be framed by
Partha Pratim
Das renaming the tables as shown below.
Week Recap
select [Link] as studentname, budget as deptbud-
studentname deptbudget
Objectives & get
Outline Brandt 50000.00
from student as S, department as D Peltier 70000.00
SQL Examples
SELECT
where [Link] name = [Link] name and budget < Levy 70000.00
Cartesian Product /
AS
100000; Sanchez 80000.00
WHERE: AND / OR Snow 70000.00
String
• The above query renames the relation student as S and Aoi 85000.00
ORDER BY
IN
the relation department as D Bourikas 85000.00
Set
• It also displays the attribute name as StudentName and Tanaka 90000.00
UNION
INTERSECT budget as DeptBudget.
EXCEPT
Aggregation • Note that the budget attribute does not have any prefix
AVG
MIN
because it occurs only in the department relation.
MAX
COUNT
SUM

Module Summary Database Management Systems Partha Pratim Das 11.8

Where: AND and OR PPD

Module 11
• From the instructor and department relations in the figure, find out the names of all instructors whose
Partha Pratim department is Finance or whose department is in any of the following buildings: Watson, Taylor.
Das
instructor ◦ Query:
Week Recap
select name
Objectives & from instructor I, department D
Outline
where [Link] name = [Link] name
SQL Examples
SELECT and ([Link] name = ’Finance’
Cartesian Product /
AS
or building in (‘Watson’,‘Taylor’));
WHERE: AND / OR
String
◦ Output:
ORDER BY
name
IN
Set Srinivasan
UNION department Wu
INTERSECT Einstein
EXCEPT Gold
Aggregation
Katz
AVG
Singh
MIN
MAX
Crick
COUNT Brandt
SUM Kim
Module Summary Database Management Systems Partha Pratim Das 11.9
String Operations PPD

Module 11
• From the course relation in the figure, find the titles of all courses whose course id has
Partha Pratim
Das three alphabets indicating the department.
◦ Query:
Week Recap
select title
Objectives &
Outline
from course
where course id like ‘ -%’;
SQL Examples
SELECT ◦ Output:
Cartesian Product /
AS
title
WHERE: AND / OR
String Intro. to Biology
ORDER BY Genetics
IN Computational Biology
Set Investment Banking
UNION
World History
INTERSECT Figure: course relation Physical Principles
EXCEPT
Aggregation
AVG • The course id of each department has either 2 or 3 alphabets in the beginning, followed
MIN
MAX
by a hyphen and then followed by a 3-digit number. The above query returns the
COUNT names of those departments that have 3 alphabets in the beginning.
SUM

Module Summary Database Management Systems Partha Pratim Das 11.10

Order By PPD

Module 11
• From the student relation in the figure, obtain the list of all students in alphabetic order of departments
Partha Pratim and within each department, in decreasing order of total credits.
Das
◦ Query:
Week Recap
select name, dept name, tot cred
Objectives &
Outline from student
SQL Examples
order by dept name ASC, tot cred DESC;
SELECT
Cartesian Product /
◦ Output:
AS
name dept name tot cred
WHERE: AND / OR
Tanaka Biology 120
String
Zhang Comp. Sci. 102
ORDER BY
Brown Comp. Sci. 58
IN Williams Comp. Sci. 54
Set Shankar Comp. Sci. 32
UNION
Figure: student relation Bourikas Elec. Eng. 98
INTERSECT Aoi Elec. Eng. 60
Chavez Finance 110
EXCEPT
Aggregation
◦ The list is first sorted in alphabetic order Brandt History 80
Sanchez Music 38
AVG of dept name. Peltier Physics 56
MIN
MAX
◦ Within each dept, it is sorted in decreas- Levy
Snow
Physics
Physics
46
0
COUNT ing order of total credits.
SUM

Module Summary Database Management Systems Partha Pratim Das 11.11

In Operator PPD

Module 11

Partha Pratim • From the teaches relation in the figure, find the IDs of all courses taught in the Fall or
Das
Spring of 2018.
Week Recap ◦ Query:
Objectives &
Outline
select course id
SQL Examples from teaches
SELECT
Cartesian Product /
where semester in (‘Fall’,‘Spring’)
AS
WHERE: AND / OR
and year =2018;
String
ORDER BY
◦ Output:
IN course id
Set
UNION
CS-315
INTERSECT FIN-201
EXCEPT Figure: teaches relation MU-199
Aggregation HIS-351
AVG CS-101
MIN Note: We can use distinct to remove CS-319
MAX
COUNT duplicates. CS-319
SUM

Module Summary Database Management Systems Partha Pratim Das 11.12

Set Operations: union PPD

Module 11 • For the same question in the previous slide, we can find the solution using union
Partha Pratim operator as follows.
Das
◦ Query:
Week Recap select course id
Objectives & from teaches
Outline where semester=‘Fall’
SQL Examples and year =2018
SELECT
Cartesian Product / union
AS
select course id
WHERE: AND / OR
String
from teaches
ORDER BY where semester=‘Spring’
IN
Set
and year =2018
UNION ◦ Output:
INTERSECT
EXCEPT Figure: teaches relation course id
Aggregation CS-101
AVG
CS-315
MIN ◦ Note that union removes all duplicates. If we use union CS-319
MAX all instead of union, we get the same set of tuples as FIN-201
COUNT
SUM
in previous slide. HIS-351
Module Summary
MU-199
Database Management Systems Partha Pratim Das 11.13
Set Operations (2): intersect PPD

Module 11

Partha Pratim • From the instructor relation in the figure, find the names of all instructors who taught
Das
in either the Computer Science department or the Finance department and whose salary
Week Recap is < 80000.
Objectives & ◦ Query:
Outline
select name
SQL Examples from instructor
SELECT
Cartesian Product /
where dept name in (‘Comp. Sci.’,‘Finance’)
AS intersect
WHERE: AND / OR
String
select name
ORDER BY from instructor
IN where salary < 80000;
Set
UNION ◦ Output:
INTERSECT
name
EXCEPT Figure: instructor relation
Aggregation Srinivasan
AVG Katz
MIN
MAX • Note that the same can be achieved using the query:
COUNT
select name from instructor where dept name in(‘Comp. Sci.’, ‘Finance’) and salary < 80000;
SUM

Module Summary Database Management Systems Partha Pratim Das 11.14

Set Operations (3): except PPD

Module 11
• From the instructor relation in the figure, find the names of all instructors who taught
Partha Pratim
Das
in either the Computer Science department or the Finance department and whose salary
is either ≥ 90000 or ≤ 70000.
Week Recap

Aggregate functions (4): count PPD

Module 11

Partha Pratim • From the section relation given in the figure, find the number of courses run in each
Das
building.
Week Recap

Objectives &
Outline
◦ Query:
SQL Examples
select building,
SELECT
Cartesian Product / count(course id) as course count
AS
WHERE: AND / OR from section
String group by building ;
ORDER BY
IN ◦ Output:
Set
building course count
UNION
INTERSECT Taylor 5
EXCEPT Packard 4
Aggregation Painter 3
AVG
Watson 3
MIN
MAX
Figure: section relation
COUNT
SUM

Module Summary Database Management Systems Partha Pratim Das 11.19

Aggregate functions (5): sum PPD

Module 11

Partha Pratim • From the course relation given in the figure, find the total credits offered by each
Das
department.
Week Recap

Objectives &
◦ Query:
Outline select dept name,
SQL Examples
sum(credits) as sum credits
SELECT
Cartesian Product / from course
AS
WHERE: AND / OR
group by dept name;
String ◦ Output:
ORDER BY
IN dept name sum credits
Set Finance 3
UNION
History 3
INTERSECT
Physics 4
EXCEPT
Aggregation
Music 3
AVG Comp. Sci. 17
MIN Figure: course relation Biology 11
MAX Elec. Eng. 3
COUNT
SUM

Module Summary Database Management Systems Partha Pratim Das 11.20

Module Summary PPD

Module 11

Partha Pratim
• SQL Examples have been practiced for
Das
◦ Select
Week Recap ◦ Cartesian Product / as
Objectives & ◦ Where: and / or
Outline

SQL Examples
◦ String Matching
SELECT ◦ Order by
Cartesian Product /
AS ◦ in
WHERE: AND / OR
String
◦ Set Operations: union, intersect, except
ORDER BY ◦ Aggregate Functions: avg, min, max, count, sum
IN
Set
UNION
INTERSECT
EXCEPT
Aggregation
AVG
MIN
MAX
COUNT
SUM

Module Summary Database Management Systems Partha Pratim Das 11.21

Module 12

Partha Pratim
Das

Objectives &
Outline Database Management Systems
Nested
Subqueries Module 12: Intermediate SQL/1
Subqueries in the
Where Clause
Subqueries in the
From Clause
Subqueries in the
Select Clause
Partha Pratim Das
Modifications of
the Database

Module Summary
Department of Computer Science and Engineering
Indian Institute of Technology, Kharagpur

ppd@[Link]

Database Management Systems Partha Pratim Das 12.1

Module Recap PPD

Module 12

Partha Pratim • SQL Examples Practiced

Das

Das
• A subquery is a select-from-where expression that is nested within another query
Objectives &
Outline • The nesting can be done in the following SQL query
Nested
Subqueries
select A1 , A2 , . . . , An
Subqueries in the
Where Clause
from r1 , r2 , . . . , rm
Subqueries in the
From Clause
where P
Subqueries in the
Select Clause
as follows:
Modifications of ◦ Ai can be replaced by a subquery that generates a single value
the Database
◦ ri can be replaced by any valid subquery
Module Summary
◦ P can be replaced with an expression of the form:
B <operation> (subquery)
where B is an attribute and <operation> to be defined later

Database Management Systems Partha Pratim Das 12.6

Module 12

Partha Pratim
Das

Definition of ”some” Clause

Module 12

Partha Pratim • F <comp> some r ⇔ ∃t ∈ r such that (F <comp> t )

Das
where <comp> can be: <, ≤, >, ≥, =, 6=
Objectives &
Outline • some represents existential quantification
Nested
Subqueries
Subqueries in the
Where Clause
Subqueries in the
From Clause
Subqueries in the
Select Clause

Modifications of
the Database

Module Summary

Database Management Systems Partha Pratim Das 12.12

Set Comparison – “all” Clause

Module 12

Partha Pratim • Find the names of all instructors whose salary is greater than the salary of all
Das
instructors in the Biology department
Objectives &
Outline
select name
Nested
from instructor
Subqueries
Subqueries in the
where salary > all (select salary
Where Clause
Subqueries in the
from instructor
From Clause
Subqueries in the
where dept name = ’Biology’);
Select Clause

Modifications of
the Database

Module Summary

Database Management Systems Partha Pratim Das 12.13

Definition of “all” Clause

Module 12

Partha Pratim • F <comp> all r ⇔ ∀t ∈ r such that (F <comp> t )

Das
Where <comp> can be: <, ≤, >, ≥, =, 6=
Objectives &
Outline • all represents universal quantification
Nested
Subqueries
Subqueries in the
Where Clause
Subqueries in the
From Clause
Subqueries in the
Select Clause

Modifications of
the Database

Module Summary

Database Management Systems Partha Pratim Das 12.14

Complex Queries using With Clause

Module 12

Partha Pratim • Find all departments where the total salary is greater than the average of the total
Das
salary at all departments
Objectives &
Outline
with dept total (dept name, value) as
Nested
select dept name, sum(salary )
Subqueries
Subqueries in the
from instructor
Where Clause
Subqueries in the
group by dept name,
From Clause
Subqueries in the
dept total avg(value) as
Select Clause
(select avg(value)
Modifications of
the Database
from dept total)
Module Summary select dept name
from dept total, dept total avg
where dept [Link] > dept total [Link];

Database Management Systems Partha Pratim Das 12.22

Module 12

Partha Pratim
Das

Objectives &
Outline

Nested
Subqueries
Subqueries in the
Where Clause
Subqueries in the
From Clause
Subqueries in the
Select Clause

Modifications of
the Database

Module Summary Subqueries in the Select Clause

Database Management Systems Partha Pratim Das 12.23

Scalar Subquery

Module 12

Partha Pratim • Scalar subquery is one which is used where a single value is expected
Das
• List all departments along with the number of instructors in each department
Objectives &
Outline select dept name,
Nested (select count(*)
Subqueries
Subqueries in the from instructor
Where Clause
Subqueries in the where [Link] name = [Link] name)
From Clause
Subqueries in the as num instructors
Select Clause
from department;
Modifications of
the Database
• Runtime error if subquery returns more than one result tuple
Module Summary

Database Management Systems Partha Pratim Das 12.24

Modifications of the Database PPD

Module 12

Partha Pratim
Das

Objectives &
Outline

Nested
Subqueries
Subqueries in the
Where Clause
Subqueries in the
From Clause
Subqueries in the
Select Clause

Modifications of
the Database

Module Summary Modifications of the Database

Database Management Systems Partha Pratim Das 12.25

Partha Pratim • Increase salaries of instructors whose salary is over $100,000 by 3%, and all others by a
Das
5%
Objectives &
Outline ◦ Write two update statements:
Nested
Subqueries
update instructor
Subqueries in the set salary = salary ∗ 1.03
Where Clause
Subqueries in the where salary > 100000;
From Clause
Subqueries in the update instructor
Select Clause

Modifications of
set salary = salary ∗ 1.05
the Database where salary <= 100000;
Module Summary
• The order is important
• Can be done better using the case statement (next slide)

Database Management Systems Partha Pratim Das 12.31

Case Statement for Conditional Updates

Module 12

Partha Pratim • Same query as before but with case statement

Das
update instructor
Objectives &
Outline
set salary = case
Nested
when salary <= 100000
Subqueries
Subqueries in the
then salary ∗ 1.05
Where Clause
Subqueries in the
else salary ∗ 1.03
From Clause
Subqueries in the
end
Select Clause

Modifications of
the Database

Module Summary

Database Management Systems Partha Pratim Das 12.32

Updates with Scalar Subqueries

Module 12

Partha Pratim • Recompute and update tot creds value for all students
Das
update student S
Objectives &
Outline
set tot creds = (select sum(credits)
Nested
from takes, course
Subqueries
Subqueries in the
where [Link] id = [Link] id and
Where Clause
Subqueries in the
[Link] = [Link] and
From Clause
Subqueries in the
[Link] <> ’F’ and
Select Clause
[Link] is not null);
Modifications of
the Database • Sets tot creds to null for students who have not taken any course
Module Summary
• Instead of sum(credits), use:
case
when sum(credits) is not null then sum(credits)
else 0
end

Database Management Systems Partha Pratim Das 12.33

Module Summary PPD

Module 12

Partha Pratim • Introduced nested subquery in SQL

Das
• Introduced data modification
Objectives &
Outline

Nested
Subqueries
Subqueries in the
Where Clause
Subqueries in the
From Clause
Subqueries in the
Select Clause

Modifications of
the Database

Module Summary
Slides used in this presentation are borrowed from [Link] with kind
permission of the authors.
Edited and new slides are marked with “PPD”.

Das
• Views
Objectives &
Outline

Join Expressions
Cross Join
Inner Join
Outer Join
Left Outer Join
Right Outer Join
Full Outer Join

Views
View Expansion
View Update
Materialized Views

Module Summary

Database Management Systems Partha Pratim Das 13.4

Join Expressions PPD

Module 13

Partha Pratim
Das

Objectives &
Outline

Join Expressions
Cross Join
Inner Join
Outer Join
Left Outer Join
Right Outer Join
Full Outer Join

Views
View Expansion
View Update
Materialized Views
Join Expressions
Module Summary

Database Management Systems Partha Pratim Das 13.5

Joined Relations

Module 13

Partha Pratim • Join operations take two relations and return as a result another relation
Das
• A join operation is a Cartesian product which requires that tuples in the two relations
Objectives &
Outline match (under some condition).
Join Expressions
Cross Join
• It also specifies the attributes that are present in the result of the join
Inner Join
Outer Join
• The join operations are typically used as subquery expressions in the from clause
Left Outer Join
Right Outer Join
Full Outer Join

Views
View Expansion
View Update
Materialized Views

Das

Objectives &
Outline

Join Expressions
Cross Join
Inner Join
Outer Join
Left Outer Join
Right Outer Join • Relation prereq
Full Outer Join

Views
View Expansion
View Update
Materialized Views

Module Summary

• Observe that
prereq information is missing for CS-315 and
course information is missing for CS-347
Database Management Systems Partha Pratim Das 13.9
Inner Join PPD

Module 13

Partha Pratim
• course inner join prereq
Das

Objectives &
Outline

Join Expressions
Cross Join
Inner Join
Outer Join • If specified as natural, the 2nd course id field is skipped
Left Outer Join
Right Outer Join
Full Outer Join

Views PPD

Module 13

Partha Pratim
Das

Objectives &
Outline

Join Expressions
Cross Join
Inner Join
Outer Join
Left Outer Join
Right Outer Join
Full Outer Join

Views
View Expansion
View Update
Materialized Views
Views
Module Summary

Database Management Systems Partha Pratim Das 13.18

Views

Module 13

Partha Pratim • In some cases, it is not desirable for all users to see the entire logical model (that is, all
Das
the actual relations stored in the database.)
Objectives &
Outline • Consider a person who needs to know an instructors name and department, but not the
Join Expressions salary. This person should see a relation described, in SQL, by
Cross Join
Inner Join
select ID, name, dept name
Outer Join
from instructor
Left Outer Join
Right Outer Join
Full Outer Join
• A view provides a mechanism to hide certain data from the view of certain users
Views • Any relation that is not of the conceptual model but is made visible to a user as a
View Expansion
View Update
“virtual relation” is called a view.
Materialized Views

Module Summary

Database Management Systems Partha Pratim Das 13.19

View Definition

Module 13

Das
create view physics fall 2009 watson as
Objectives &
Outline
(select course id, room number
Join Expressions
from (select [Link] id, building, room number
Cross Join from course, section
Inner Join
Outer Join where [Link] id = [Link] id
Left Outer Join
Right Outer Join
and [Link] name = ’Physics’
Full Outer Join and [Link] = ’Fall’
Views and [Link] = ’2009’)
View Expansion
View Update where building = ’Watson’);
Materialized Views

Module Summary

Database Management Systems Partha Pratim Das 13.23

Views Defined Using Other Views

Module 13

Partha Pratim • One view may be used in the expression defining another view
Das
• A view relation v1 is said to depend directly on a view relation v2 if v2 is used in the
Objectives &
Outline expression defining v1
Join Expressions
Cross Join
• A view relation v1 is said to depend on view relation v2 if either v1 depends directly on
Inner Join v2 or there is a path of dependencies from v1 to v2
Outer Join
Left Outer Join • A view relation v is said to be recursive if it depends on itself
Right Outer Join
Full Outer Join

Views
View Expansion
View Update
Materialized Views

Module Summary

Database Management Systems Partha Pratim Das 13.24

View Expansion*

Module 13

Partha Pratim • A way to define the meaning of views defined in terms of other views
Das
• Let view v1 be defined by an expression e1 that may itself contain uses of view relations
Objectives &
Outline • View expansion of an expression repeats the following replacement step:
Join Expressions
Cross Join
repeat
Inner Join Find any view relation vi in e1
Outer Join
Left Outer Join
Replace the view relation vi by the expression defining vi
Right Outer Join
Full Outer Join
until no more view relations are present in e1
Views • As long as the view definitions are not recursive, this loop will terminate
View Expansion
View Update
Materialized Views

Module Summary

Database Management Systems Partha Pratim Das 13.25

Update of a View

Module 13

Views
View Expansion
View Update
Materialized Views

Module Summary

Database Management Systems Partha Pratim Das 13.29

Module Summary PPD

Module 13

Partha Pratim • Learnt SQL expressions for Join and Views

Das

Objectives &
Outline

Join Expressions
Cross Join
Inner Join
Outer Join
Left Outer Join
Right Outer Join
Full Outer Join

Views
View Expansion Slides used in this presentation are borrowed from [Link] with kind
View Update
Materialized Views
permission of the authors.
Module Summary Edited and new slides are marked with “PPD”.

Database Management Systems Partha Pratim Das 13.30

Module 14

Partha Pratim
Das

Objectives &
Outline Database Management Systems
Transactions
Module 14: Intermediate SQL/3
Integrity
Constraints
Referential Integrity

SQL Data Types

and Schemas
Built-in Types
Partha Pratim Das
Index
UDT
Domains
Department of Computer Science and Engineering
Large Object
Indian Institute of Technology, Kharagpur
Authorization
ppd@[Link]
Privileges
Revocation
Roles

Module Summary

Database Management Systems Partha Pratim Das 14.1

Module Recap PPD

Module 14

Partha Pratim • SQL expressions for Join and Views

Module 14

Partha Pratim
Das

Objectives &
Outline

Transactions

Integrity
Constraints
Referential Integrity

SQL Data Types

and Schemas
Built-in Types
Index
UDT
Domains
Large Object
Transactions
Authorization
Privileges
Revocation
Roles

Module Summary

Database Management Systems Partha Pratim Das 14.5

Transactions

Module 14

Partha Pratim • Unit of work

Das
• Atomic transaction
Objectives &
Outline ◦ either fully executed or rolled back as if it never occurred
Transactions
• Isolation from concurrent transactions
Integrity
Constraints
Referential Integrity
• Transactions begin implicitly
SQL Data Types ◦ Ended by commit work or rollback work
and Schemas
Built-in Types • But default on most databases: each SQL statement commits automatically
Index
UDT ◦ Can turn off auto commit for a session (for example, using API)
Domains
Large Object
◦ In SQL:1999, can use: begin atomic ... end
Authorization . Not supported on most databases
Privileges
Revocation
Roles

Module Summary

Database Management Systems Partha Pratim Das 14.6

Integrity Constraints PPD

Module 14

Partha Pratim
Das

Das
• Ensure that semester is one of fall, winter, spring or summer:
Objectives &
Outline

Transactions create table section (

Integrity
Constraints
course id varchar(8),
Referential Integrity sec id varchar(8),
SQL Data Types
and Schemas
semester varchar(6),
Built-in Types year numeric(4,0),
Index
UDT
building varchar(15),
Domains room number varchar(7),
Large Object

Authorization
time slot id varchar(4),
Privileges primary key (course id, sec id, semester, year ),
Revocation
Roles
check (semester in (’Fall’, ’Winter’, ’Spring’, ’Summer’))
Module Summary );

Database Management Systems Partha Pratim Das 14.11

Referential Integrity

Module 14

Partha Pratim • Ensures that a value that appears in one relation for a given set of attributes also
Das
appears for a certain set of attributes in another relation
Objectives &
Outline • Example: If “Biology” is a department name appearing in one of the tuples in the
Transactions instructor relation, then there exists a tuple in the department relation for “Biology”
Integrity
Constraints • Let A be a set of attributes. Let R and S be two relations that contain attributes A and
Referential Integrity
where A is the primary key of S. A is said to be a foreign key of R if for any values of
SQL Data Types
and Schemas A appearing in R these values also appear in S
Built-in Types
Index
UDT
Domains
Large Object

Authorization
Privileges
Revocation
Roles

Module Summary

Database Management Systems Partha Pratim Das 14.12

Cascading Actions in Referential Integrity PPD

Module 14 • With cascading, you can define the actions that the Database Engine takes when a user
Partha Pratim tries to delete or update a key to which existing foreign keys point
Das
• create table course (
Objectives &
Outline course id char(5) primary key,
Transactions title varchar(20),
Integrity dept name varchar(20) references department
Constraints
Referential Integrity )
SQL Data Types
and Schemas
• create table course (
Built-in Types ...
Index
UDT dept name varchar(20),
Domains
Large Object
foreign key (dept name) references department
Authorization on delete cascade
Privileges
on update cascade,
Revocation
Roles ...
Module Summary )
• Alternative actions to cascade: no action, set null, set default
Database Management Systems Partha Pratim Das 14.13
Integrity Constraint Violation During Transactions

Module 14

Partha Pratim • create table person (

Das
ID char(10),
Objectives &
Outline
name char(40),
Transactions
mother char(10),
Integrity father char(10),
Constraints
Referential Integrity
primary key ID,
SQL Data Types foreign key father references person,
and Schemas
Built-in Types
foreign key mother references person)
Index
UDT
• How to insert a tuple without causing constraint violation?
Domains
Large Object
◦ Insert father and mother of a person before inserting person
Authorization ◦ OR, Set father and mother to null initially, update after inserting all persons (not
Privileges
possible if father and mother attributes declared to be not null)
Revocation
Roles ◦ OR Defer constraint checking (will discuss later)
Module Summary

Database Management Systems Partha Pratim Das 14.14

SQL Data Types and Schemas PPD

Module 14

Partha Pratim
Das

Objectives &
Outline

Transactions

Integrity
Constraints
Referential Integrity

SQL Data Types

and Schemas
Built-in Types
Index
UDT
Domains
Large Object
SQL Data Types and Schemas
Authorization
Privileges
Revocation
Roles

Module Summary

Database Management Systems Partha Pratim Das 14.15

Built-in Data Types in SQL

Objectives &
Outline

Privileges in SQL

Module 14

Partha Pratim • select: allows read access to relation, or the ability to query using the view
Das
◦ Example: grant users U1 , U2 , and U3 select authorization on the instructor relation:
Objectives &
Outline grant select on instructor to U1 , U2 , U3
Transactions • insert: the ability to insert tuples
Integrity
Constraints • update: the ability to update using the SQL update statement
Referential Integrity

SQL Data Types

• delete: the ability to delete tuples.
and Schemas
Built-in Types • all privileges: used as a short form for all the allowable privileges
Index
UDT
Domains
Large Object

Authorization
Privileges
Revocation
Roles

Module Summary

Database Management Systems Partha Pratim Das 14.24

Revoking Authorization in SQL

Module 14

Partha Pratim • The revoke statement is used to revoke authorization

Das
revoke <privilege list>
Objectives &
Outline
on <relation name or view name> from <user list>
Transactions • Example:
Integrity
Constraints
revoke select on branch from U1 , U2 , U3
Referential Integrity
• <privilege-list> may be all to revoke all privileges the revokee may hold
SQL Data Types
and Schemas • If <revokee-list> includes public, all users lose the privilege except those granted it
Built-in Types
Index explicitly
UDT
Domains • If the same privilege was granted twice to the same user by different grantees, the user
Large Object
may retain the privilege after the revocation
Authorization
Privileges • All privileges that depend on the privilege being revoked are also revoked
Revocation
Roles

Module Summary

Database Management Systems Partha Pratim Das 14.25

Roles

Module 14

Partha Pratim • create role instructor ;

Das
grant instructor to Amit;
Objectives &
Outline • Privileges can be granted to roles:
Transactions grant select on takes to instructor ;
Integrity
Constraints • Roles can be granted to users, as well as to other roles
Referential Integrity
create role teaching assistant
SQL Data Types
and Schemas grant teaching assistant to instructor ;
Built-in Types
Index
◦ Instructor inherits all privileges of teaching assistant
UDT
Domains
• Chain of roles
Large Object
◦ create role dean;
Authorization
Privileges ◦ grant instructor to dean;
Revocation
Roles
◦ grant dean to Satoshi;
Module Summary

Database Management Systems Partha Pratim Das 14.26

Authorization on Views

Module 14

Partha Pratim • create view geo instructor as

Das
(select *
Objectives &
Outline
from instructor
Transactions
where dept name = ’Geology’);
Integrity grant select on geo instructor to geo staff
Constraints
Referential Integrity • Suppose that a geo staff member issues
SQL Data Types
and Schemas
select *
Built-in Types from geo instructor ;
Index
UDT • What if
Domains
Large Object ◦ geo staff does not have permissions on instructor ?
Authorization ◦ creator of view did not have some permissions on instructor ?
Privileges
Revocation
Roles

Module Summary

Database Management Systems Partha Pratim Das 14.27

Other Authorization Features

Module 14

Partha Pratim • references privilege to create foreign key

Das
grant reference (dept name) on department to Mariano;
Objectives &
Outline ◦ why is this required?
Transactions • Transfer of privileges
Integrity
Constraints ◦ grant select on department to Amit with grant option;
Referential Integrity
◦ revoke select on department from Amit, Satoshi cascade;
SQL Data Types
and Schemas ◦ revoke select on department from Amit, Satoshi restrict;
Built-in Types
Index
UDT
Domains
Large Object

Authorization
Privileges
Revocation
Roles

Module Summary

Database Management Systems Partha Pratim Das 14.28

Module Summary PPD

Module 14

Partha Pratim • Introduced transactions

Das
• Learnt SQL expressions for integrity constraints
Objectives &
Outline • Familiarized with more data types in SQL
Transactions

Integrity
• Discussed authorization in SQL
Constraints
Referential Integrity

SQL Data Types

and Schemas
Built-in Types
Index
UDT
Domains
Large Object

Authorization
Privileges
Revocation
Roles
Slides used in this presentation are borrowed from [Link] with kind
Module Summary
permission of the authors.
Edited and new slides are marked with “PPD”.
Database Management Systems Partha Pratim Das 14.29
Module 15

Partha Pratim
Das

Objectives &
Outline Database Management Systems
Functions and
Procedural Module 15: Advanced SQL
Constructs

Triggers
Triggers :
Functionality vs
Performance
Partha Pratim Das
Module Summary

Department of Computer Science and Engineering

Indian Institute of Technology, Kharagpur

ppd@[Link]

Database Management Systems Partha Pratim Das 15.1

Module Recap PPD

Module 15

Partha Pratim • Transactions

Das
• Integrity Constraints
Objectives &
Outline • More Data Types in SQL
Functions and
Procedural • Authorization in SQL
Constructs

Triggers
Triggers :
Functionality vs
Performance

Module Summary

Database Management Systems Partha Pratim Das 15.2

Module Objectives

Module 15

Partha Pratim • To familiarize with functions and procedures in SQL

Das
• To understand the triggers and their performance issues
Objectives &
Outline

Functions and
Procedural
Constructs

Triggers
Triggers :
Functionality vs
Performance

Module Summary

Database Management Systems Partha Pratim Das 15.3

Module Outline

Module 15

Partha Pratim • Functions and Procedural Constructs

Das
• Triggers
Objectives &
Outline ◦ Functionality vs Performance
Functions and
Procedural
Constructs

Triggers
Triggers :
Functionality vs
Performance

Module Summary

Database Management Systems Partha Pratim Das 15.4

Module 15

Partha Pratim
Das

Objectives &
Outline

Functions and
Procedural
Constructs

Triggers
Triggers :
Functionality vs
Performance

Module Summary

Functions and Procedural Constructs

Database Management Systems Partha Pratim Das 15.5

Native Language← →Query Language

Module 15

Partha Pratim
Das

Objectives &
Outline

Functions and
Procedural
Constructs

Triggers
Triggers :
Functionality vs
Performance

Module Summary

Database Management Systems Partha Pratim Das 15.6

Functions and Procedures

Module 15

Partha Pratim • Functions / Procedures and Control Flow Statements were added in SQL:1999
Das
◦ Functions/Procedures can be written in SQL itself, or in an external
Objectives &
Outline programming language (like C, Java)
Functions and ◦ Functions written in an external languages are particularly useful with specialized
Procedural
Constructs data types such as images and geometric objects
Triggers . Example: Functions to check if polygons overlap, or to compare images for
Triggers :
Functionality vs
Performance
similarity
Module Summary ◦ Some database systems support table-valued functions, which can return a
relation as a result
• SQL:1999 also supports a rich set of imperative constructs, including loops,
if-then-else, and assignment
• Many databases have proprietary procedural extensions to SQL that differ from
SQL:1999

Database Management Systems Partha Pratim Das 15.7

SQL Functions

Module 15 • Define a function that, given the name of a department, returns the count of the
Partha Pratim
Das
number of instructors in that department:
create function dept count (dept name varchar(20))
Objectives &
Outline returns integer
Functions and begin
Procedural
Constructs declare d count integer;
Triggers select count (*) into d count
Triggers :
Functionality vs from instructor
Performance

Module Summary
where [Link] name = dept name
return d cont;
end
• The function dept count can be used to find the department names and budget of all
departments with more that 12 instructors:
select dept name, budget
from department
where dept count (dept name ) > 12
Database Management Systems Partha Pratim Das 15.8
SQL functions (2)

Module 15

Partha Pratim • Compound statement: begin . . . end

Das
May contain multiple SQL statements between begin and end.
Objectives &
Outline • returns – indicates the variable-type that is returned (for example, integer)
Functions and
Procedural
• return – specifies the values that are to be returned as result of invoking the function
Constructs

Triggers
• SQL function are in fact parameterized views that generalize the regular notion of
Triggers : views by allowing parameters
Functionality vs
Performance

Module Summary

Database Management Systems Partha Pratim Das 15.9

Table Functions

Module 15
• Functions that return a relation as a result added in SQL:2003
Partha Pratim
Das • Return all instructors in a given department:
Objectives &
create function instructor of (dept name char(20))
Outline
returns table (
Functions and
Procedural ID varchar(5),
Constructs
name varchar(20),
Triggers
Triggers :
dept name varchar(20)
Functionality vs
Performance salary numeric(8, 2) )
Module Summary returns table
(select ID, name, dept name, salary
from instructor
where [Link] name = instructor [Link] name)
• Usage
select *
from table (instructor of (‘Music’))
Database Management Systems Partha Pratim Das 15.10
SQL Procedures

Module 15 • The dept count function could instead be written as procedure:

Partha Pratim
Das
create procedure dept count proc (
in dept name varchar (20), out d count integer)
Objectives &
Outline begin
Functions and select count(*) into d count
Procedural
Constructs from instructor
Triggers where [Link] name = dept count [Link] name
Triggers :
Functionality vs end
Performance

Module Summary • Procedures can be invoked either from an SQL procedure or from embedded SQL,
using the call statement.
declare d count integer;
call dept count proc(‘Physics’, d count);
• Procedures and functions can be invoked also from dynamic SQL
• SQL:1999 allows overloading - more than one function/procedure of the same name as
long as the number of arguments and / or the types of the arguments differ
Database Management Systems Partha Pratim Das 15.11
Language Constructs for Procedures and Functions

Module 15

Partha Pratim • SQL supports constructs that gives it almost all the power of a general-purpose
Das
programming language.
Objectives &
Outline ◦ Warning: Most database systems implement their own variant of the
Functions and standard syntax
Procedural
Constructs • Compound statement: begin . . . end
Triggers
Triggers :
◦ May contain multiple SQL statements between begin and end.
Functionality vs
Performance ◦ Local variables can be declared within a compound statements
Module Summary

Database Management Systems Partha Pratim Das 15.12

Language Constructs (2): while and repeat

Module 15
• while loop:
Partha Pratim
Das while boolean expression do
Objectives &
sequence of statements;
Outline end while;
Functions and
Procedural • repeat loop:
Constructs
repeat
Triggers
Triggers : sequence of statements;
Functionality vs
Performance until boolean expression
Module Summary end repeat;

Database Management Systems Partha Pratim Das 15.13

Language Constructs (3): for

Module 15

Partha Pratim • for loop

Das
◦ Permits iteration over all results of a query
Objectives &
Outline • Find the budget of all departments:
Functions and
Procedural
declare n integer default 0;
Constructs for r as
Triggers
Triggers :
select budget from department
Functionality vs
Performance
do
Module Summary set n = n + [Link]
end for;

Database Management Systems Partha Pratim Das 15.14

Language Constructs (4): if-then-else

Module 15 • Conditional statements

Partha Pratim ◦ if-then-else
Das
◦ case
Objectives &
Outline • if-then-else statement
Functions and if boolean expression then
Procedural
Constructs sequence of statements;
Triggers elseif boolean expression then
Triggers :
Functionality vs
Performance
sequence of statements;
Module Summary ···
else
sequence of statements;
end if;
• The if statement supports the use of optional elseif clauses and a default else clause.
• Example procedure: registers student after ensuring classroom capacity is not exceeded
◦ Returns 0 on success and -1 if capacity is exceeded
◦ See book (page 177) for details
Database Management Systems Partha Pratim Das 15.15
Language Constructs (5): Simple case

Module 15

Partha Pratim • Simple case statement

Das
case variable
Objectives &
Outline
when value1 then
Functions and
sequence of statements;
Procedural
Constructs
when value2 then
Triggers sequence of statements;
Triggers :
Functionality vs
···
Performance
else
Module Summary
sequence of statements;
end case;
• The when clause of the case statement defines the value that when satisfied
determines the flow of control

Database Management Systems Partha Pratim Das 15.16

Language Constructs (6): Searched case

Module 15

Partha Pratim • Searched case statement

Das
case
Objectives &
Outline
when sql-expression = value1 then
Functions and
sequence of statements;
Procedural
Constructs
when sql-expression = value2 then
Triggers sequence of statements;
Triggers :
Functionality vs
···
Performance
else
Module Summary
sequence of statements;
end case;
• Any supported SQL expression can be used here. These expressions can contain
references to variables, parameters, special registers, and more.

Database Management Systems Partha Pratim Das 15.17

Language Constructs (7): Exception

Module 15

Partha Pratim • Signaling of exception conditions, and declaring handlers for exceptions
Das

Objectives &
Outline
declare out of classroom seats condition
Functions and
declare exit handler for out of classroom seats
Procedural
Constructs
begin
Triggers ...
Triggers :
Functionality vs
signal out of classroom seats
Performance
...
Module Summary
end
◦ The handler here is exit – causes enclosing begin . . . end to be terminate and exit
◦ Other actions possible on exception

Database Management Systems Partha Pratim Das 15.18

External Language Routines*

Module 15 • SQL:1999 allows the definition of functions / procedures in an imperative programming

Partha Pratim language, (Java, C#, C or C++) which can be invoked from SQL queries
Das
• Such functions can be more efficient than functions defined in SQL, and computations
Objectives &
Outline that cannot be carried out in SQL can be executed by these functions
Functions and
Procedural
• Declaring external language procedures and functions
Constructs

Triggers
Triggers :
create procedure dept count proc(
Functionality vs
Performance
in dept name varchar(20),
Module Summary out count integer)
language C
external name ’/usr/avi/bin/dept count proc’

create function dept count(dept name varchar(20))

Module 15

Triggering Events and Actions in SQL

Module 15
• Triggering event can be an insert, delete or update
Partha Pratim
Das • Triggers on update can be restricted to specific attributes
Objectives & ◦ For example, after update of grade on takes
Outline

Functions and • Values of attributes before and after an update can be referenced
Procedural
Constructs ◦ referencing old row as : for deletes and updates
Triggers ◦ referencing new row as : for inserts and updates
Triggers :
Functionality vs
Performance • Triggers can be activated before an event, which can serve as extra constraints.
Module Summary For example, convert blank grades to null.
create trigger setnull trigger before update of takes
referencing new row as nrow
for each row
when ([Link] = ‘ ‘)
begin atomic
set [Link] = null;
end;
Database Management Systems Partha Pratim Das 15.27
Trigger to Maintain credits earned value

Module 15

Partha Pratim
create trigger credits earned after update of grade on (takes)
Das referencing new row as nrow
Objectives & referencing old row as orow
Outline
for each row
Functions and
Procedural when [Link] <>’F’ and [Link] is not null
Constructs

Triggers
and ([Link] = ’F’ or [Link] is null)
Triggers : begin atomic
Functionality vs
Performance update student
Module Summary
set tot cred= tot cred +
(select credits
from course
where [Link] id=[Link] id)
where [Link] = [Link];
end;

Database Management Systems Partha Pratim Das 15.28

How to use triggers? PPD

Module 15

Partha Pratim • The optimal use of DML triggers is for short, simple, and easy to maintain write
Das
operations that act largely independent of an applications business logic.
Objectives &
Outline • Typical and recommended uses of triggers include:
Functions and
Procedural
◦ Logging changes to a history table
Constructs ◦ Auditing users and their actions against sensitive tables
Triggers
Triggers :
◦ Adding additional values to a table that may not be available to an application (due
Functionality vs
Performance
to security restrictions or other limitations), such as:
Module Summary . Login/user name
. Time an operation occurs
. Server/database name
◦ Simple validation
Source: SQL Server triggers: The good and the scary

Database Management Systems Partha Pratim Das 15.29

How not to use triggers? PPD

Module 15

Partha Pratim • Triggers are like Lays: Once you pop, you can’t stop
Das
• One of the greatest challenges for architects and developers is to ensure that
Objectives &
Outline ◦ triggers are used only as needed, and
Functions and
Procedural
◦ to not allow them to become a one-size-fits-all solution for any data needs that
Constructs happen to come along
Triggers
Triggers : • Adding triggers is often seen as faster and easier than adding code to an application,
Functionality vs
Performance but the cost of doing so is compounded over time with each added line of code
Module Summary
Source: SQL Server triggers: The good and the scary

Database Management Systems Partha Pratim Das 15.30

How to use triggers? (2) PPD

Module 15

Partha Pratim • Triggers can become dangerous when:

Das
◦ There are too many
Objectives &
Outline ◦ Trigger code becomes complex
Functions and ◦ Triggers go cross-server - across databases over network
Procedural
Constructs ◦ Triggers call triggers
Triggers ◦ Recursive triggers are set to ON. This database-level setting is set to off by default
Triggers :
Functionality vs ◦ Functions, stored procedures, or views are in triggers
Performance

Module Summary
◦ Iteration occurs
Source: SQL Server triggers: The good and the scary

Database Management Systems Partha Pratim Das 15.31

Module Summary PPD

Module 15

Partha Pratim • Familiarized with functions and procedures in SQL

Das
• Understood the triggers
Objectives &
Outline • Familiarized with some of the performance issues of triggers
Functions and
Procedural
Constructs

Triggers
Triggers :
Functionality vs
Performance

Module Summary

Slides used in this presentation are borrowed from [Link] with kind
permission of the authors.
Edited and new slides are marked with “PPD”.

Database Management Systems Partha Pratim Das 15.32

Module 16

Partha Pratim
Das

Das
• Procedural language
Week Recap

Objectives &
• Six basic operators
Outline
◦ select: σ
Relational
Algebra ◦ project: Π
Select
Project
◦ union: ∪
Union
◦ set difference: −
Difference
Intersection ◦ Cartesian product: x
Cartesian Product
Rename
◦ rename: ρ
Division
• The operators take one or two relations as inputs and produce a new relation as a result
Module Summary

Database Management Systems Partha Pratim Das 16.7

Select Operation PPD

Module 16
• Notation: σp (r )
Partha Pratim
Das • p is called the selection predicate
Week Recap • Defined as:
Objectives &
Outline
σp (r ) = {t|t ∈ r and p(t)}
Relational
Algebra where p is a formula in propositional calculus consisting of
Select
terms connected by : ∧ (and), ∨ (or), ¬ (not)
Project
Union Each terms is one of:
Difference
Intersection
Cartesian Product
Rename < attribute > op < attribute > or < constant >
Division

Module Summary
where op is one of: =, 6=, >, ≥ . < . ≤

• Example of selection:
σdept name = ’Physics’ (instructor )
Database Management Systems Partha Pratim Das 16.8
Project Operation PPD

Module 16

Partha Pratim
• Notation: ΠA1 ,A2 ,...Ak (r)
Das where A1 , A2 are attribute names and r is a relation
Week Recap • The result is defined as the relation of k columns
Objectives &
Outline
obtained by erasing the columns that are not listed
Relational
Algebra
• Duplicate rows removed from result, since relations
Select are sets
Project
Union • Example: To eliminate the dept name attribute of
Difference
Intersection
instructor
Cartesian Product
Rename
ΠID,name,salary (instructor )
Division

Module Summary

Database Management Systems Partha Pratim Das 16.9

Union Operation PPD

Module 16
• Notation: r ∪ s
Partha Pratim
Das • Defined as: r ∪ s = {t|t ∈ r or t ∈ s}
Week Recap • For r ∪ s to be valid.
Objectives &
Outline a) r, s must have the same arity (same number of
Relational attributes)
Algebra
Select
b) The attribute domains must be compatible (ex-
Project
Union
ample: 2nd column of r deals with the same
Difference type of values as does the 2nd column of s)
Intersection
Cartesian Product
c) Example: to find all courses taught in the Fall
Rename 2009 semester, or in the Spring 2010 semester,
Division

Module Summary
or in both
Πcourse id (σsemester =“Fall”∧year =2009 (section)) ∪ Πcourse id (σsemester =“Spring ”∧year =2010 (section))

Database Management Systems Partha Pratim Das 16.10

Difference Operation PPD

Module 16
• Notation r − s
Partha Pratim
Das • Defined as: r − s = {t|t ∈ r and t ∈
/ s}
Week Recap • Set differences must be taken between compatible
Objectives &
Outline
relations
Relational ◦ r and s must have the same arity
Algebra
Select
◦ attribute domains of r and s must be compatible
Project
Union • Example: to find all courses taught in the Fall 2009
Difference
Intersection
semester, but not in the Spring 2010 semester
Cartesian Product
Rename
Division
Πcourse id (σsemester =“Fall”∧year =2009 (section))−
Module Summary Πcourse id (σsemester =“Spring ”∧year =2010 (section))

Database Management Systems Partha Pratim Das 16.11

Intersection Operation

Module 16

Partha Pratim
• Notation: r ∩ s
Das
• Defined as:
Week Recap

Division Examples PPD

Module 16

Partha Pratim
Das

Week Recap

Objectives &
Outline

Relational
Algebra
Select
Project
Union
Difference
Intersection
Cartesian Product
Rename
Division

Module Summary

Database Management Systems Partha Pratim Das 16.16

Division Examples (2) PPD

Module 16

Partha Pratim
Das

Week Recap

Objectives &
Outline

Relational
Algebra
Select
Project
Union
Difference
Intersection
Cartesian Product
Rename
Division

Module Summary

Database Management Systems Partha Pratim Das 16.17

Division Examples (3) PPD

Module 16

Partha Pratim
Das

Week Recap

Objectives &
Outline

Relational
Algebra
Select
Project
Union
Difference
Intersection
Cartesian Product
Rename
Division

Module Summary

Database Management Systems Partha Pratim Das 16.18

Division Example (4) PPD

Module 16
• Relations r, s:
Partha Pratim
Das

Week Recap

Objectives &
Outline

Relational
Algebra
Select
Project
Union
Difference
Intersection
Cartesian Product
Rename
Division

Module Summary
e.g. A is customer name
B is branch-name
1 and 2 here show two specific branch-names
(Find customers who have an account in all
branches of the bank)
• r ÷ s:
Database Management Systems Partha Pratim Das 16.19
Division Example (5) PPD

Module 16
• Relations r, s:
Partha Pratim
Das

Week Recap

Objectives &
Outline

Relational
Algebra
Select
Project
Union
Difference
Intersection
Cartesian Product
Rename e.g. Students who have taken both “a” and “b”
Division courses, with instructor “1”
Module Summary
(Find students who have taken all courses given
• r ÷ s: by instructor 1)

Source: [Link]/silberslides/Divsion
Database Management Systems Partha Pratim Das 16.20
Module Summary

Module 16

Partha Pratim • Discussed relational algebra with examples

Das

Week Recap

Objectives &
Outline

Relational
Algebra
Select
Project
Union
Difference
Intersection
Cartesian Product Slides used in this presentation are borrowed from [Link] with kind
Rename
Division
permission of the authors.
Module Summary Edited and new slides are marked with “PPD”.

Database Management Systems Partha Pratim Das 16.21

Module 17

Partha Pratim
Das

Objectives &
Outline Database Management Systems
Predicate Logic
Module 17: Formal Relational Query Languages/2
Tuple Relational
Calculus

Domain
Relational
Calculus

Equivalence of
Partha Pratim Das
Algebra and
Calculus
Department of Computer Science and Engineering
Module Summary Indian Institute of Technology, Kharagpur

ppd@[Link]

Database Management Systems Partha Pratim Das 17.1

Module Recap PPD

Module 17

Partha Pratim • Relational Algebras and its Operations

Das

Objectives &
Outline

Predicate Logic

Tuple Relational
Calculus

Domain
Relational
Calculus

Equivalence of
Algebra and
Calculus

Module Summary

Database Management Systems Partha Pratim Das 17.2

Module Objectives PPD

Module 17

Partha Pratim • To understand formal calculus-based query language through relational algebra
Das

Objectives &
Outline

Predicate Logic

Tuple Relational
Calculus

Domain
Relational
Calculus

Database Management Systems Partha Pratim Das 17.5

PPD

Module 17

Partha Pratim
Das

Objectives &
Outline

Predicate Logic

Tuple Relational
Calculus

Domain
Relational
Calculus

Equivalence of
Algebra and
Calculus

Module Summary Predicate Logic

Database Management Systems Partha Pratim Das 17.6

Predicate Logic

Module 17

Partha Pratim
Predicate Logic or Predicate Calculus is an extension of Propositional Logic or
Das Boolean Algebra.
Objectives &
Outline
It adds the concept of predicates and quantifiers to better capture the meaning of
Predicate Logic

Tuple Relational
statements that cannot be adequately expressed by propositional logic.
Calculus

Domain
Relational
Tuple Relational Calculus and Domain Relational Calculus are based on Predicate
Calculus
Calculus
Equivalence of
Algebra and
Calculus

Module Summary

Database Management Systems Partha Pratim Das 17.7

Predicate

Module 17

Partha Pratim • Consider the statement, “x is greater than 3”. It has two parts. The first part, the
Das
variable x, is the subject of the statement. The second part, “is greater than 3”, is the
Objectives &
Outline
predicate. It refers to a property that the subject of the statement can have.
Predicate Logic • The statement “x is greater than 3” can be denoted by P(x) where P denotes the
Tuple Relational
Calculus
predicate “is greater than 3” and x is the variable.
Domain • The predicate P can be considered as a function. It tells the truth value of the
Relational
Calculus statement P(x) at x. Once a value has been assigned to the variable x, the statement
Equivalence of P(x) becomes a proposition and has a truth or false value.
Algebra and
Calculus
• In general, a statement involving n variables x1 , x2 , x3 , · · · , xn can be denoted by
Module Summary
P(x1 , x2 , x3 , · · · , xn ). Here P is also referred to as n-place predicate or a n-ary predicate.

Database Management Systems Partha Pratim Das 17.8

Quantifiers

Module 17

Partha Pratim
In predicate logic, predicates are used alongside quantifiers to express the extent to which a
Das predicate is true over a range of elements. Using quantifiers to create such propositions is
Objectives & called quantification. There are two types of quantifiers:
Outline

Predicate Logic
• Universal Quantifier
Tuple Relational
Calculus
• Existential Quantifier
Domain
Relational
Calculus

Equivalence of
Algebra and
Calculus

Module Summary

Database Management Systems Partha Pratim Das 17.9

Universal Quantifier

Module 17

Partha Pratim
Universal Quantification: Mathematical statements sometimes assert that a property is
Das true for all the values of a variable in a particular domain, called the domain of discourse
Objectives &
Outline
• Such a statement is expressed using universal quantification.
Predicate Logic • The universal quantification of P(x) for a particular domain is the proposition that
Tuple Relational
Calculus
asserts that P(x) is true for all values of x in this domain
Domain • The domain is very important here since it decides the possible values of x
Relational
Calculus
• Formally, The universal quantification of P(x) is the statement “P(x) for all values of x
Equivalence of
Algebra and in the domain”.
Calculus

Module Summary
• The notation ∀P(x) denotes the universal quantification of P(x). Here ∀ is called the
universal quantifier. ∀P(x) is read as “for all x P(x)”.
• Example: Let P(x) be the statement “x + 2 > x“. What is the truth value of the
statement ∀x P(x)?
Solution: As x + 2 is greater than x for any real number, so P(x) ≡ T for all x or
∀x P(x) ≡ T
Database Management Systems Partha Pratim Das 17.10
Existential Quantifier

Module 17

Partha Pratim
Existential Quantification: Some mathematical statements assert that there is an
Das element with a certain property. Such statements are expressed by existential
Objectives & quantification. Existential quantification can be used to form a proposition that is true if
Outline
and only if P(x) is true for at least one value of x in the domain.
Predicate Logic

Tuple Relational • Formally, the existential quantification of P(x) is the statement ”There exists an
Calculus
element x in the domain such that P(x)”.
Domain
Relational
Calculus
• The notation ∃P(x) denotes the existential quantification of P(x). Here ∃ is called the
Equivalence of existential quantifier. ∃P(x) is read as “There is atleast one such x such that P(x)”
Algebra and
Calculus • Example: Let P(x) be the statement “x > 5”. What is the truth value of the
Module Summary statement ∃xP(x)?
Solution: P(x) is true for all real numbers greater than 5and false for all real numbers
less than 5. So ∃x P(x) ≡ T

Database Management Systems Partha Pratim Das 17.11

PPD

Module 17

Partha Pratim
Das

Objectives &
Outline

Predicate Logic

Tuple Relational
Calculus

Domain
Relational
Calculus

Equivalence of
Algebra and
Calculus

Module Summary Tuple Relational Calculus

Database Management Systems Partha Pratim Das 17.12

Tuple Relational Calculus

Module 17

Partha Pratim
TRC is a non-procedural query language, where each query is of the form
Das

Objectives & {t | P(t)}

Outline

Predicate Logic where t = resulting tuples,

Tuple Relational
Calculus
P(t) = known as predicate and these are the conditions that are used to fetch t.
Domain
P(t) may have various conditions logically combined with OR (∨), AND (∧), NOT(¬).
Relational
Calculus

Equivalence of
Algebra and It also uses quantifiers:
Calculus
∃t ∈ r (Q(t)) = “there exists” a tuple in t in relation r such that predicate Q(t) is true.
Module Summary
∀t ∈ r (Q(t)) = Q(t) is true “for all” tuples in relation r.

• {P | ∃S ∈ Students and ([Link] > 8 ∧ [Link] = [Link] ∧ [Link] = [Link])} :

returns the name and age of students with a CGPA above 8.

Database Management Systems Partha Pratim Das 17.13

Predicate Calculus Formula

Module 17

Partha Pratim a) Set of attributes and constants

Das
b) Set of comparison operators: (e.g ., <, ≤, =, 6=, >, ≥)
Objectives &
Outline c) Set of connectives: and (∧), or (∨), not (¬)
Predicate Logic

Tuple Relational
d) Implication (⇒) : x ⇒ y , if x if true, then y is true
Calculus x ⇒ y ≡ ¬x ∨ y
Domain
Relational
e) Set of quantifiers:
Calculus
• ∃t ∈ r (Q(t)) ≡ “there exists” a tuple in t in relation r such that predicate Q(t) is true
Equivalence of
Algebra and
• ∀t ∈ r (Q(t)) ≡ Q is true “for all” tuples t in relation r
Calculus

Module Summary

Database Management Systems Partha Pratim Das 17.14

TRC Example

Module 17
Solution:
Partha Pratim
Das
Student {[Link] | Student(t) ∧ [Link] > 21}
Fname Lname Age Course
Objectives & David Sharma 27 DBMS
Outline {[Link] | t ∈ Student ∧ [Link] > 21}
Aaron Lilly 17 JAVA
Predicate Logic
Sahil Khan 19 Python
Tuple Relational
Calculus Sachin Rao 20 DBMS
Varun George 23 JAVA {t | ∃s ∈ Student([Link] > 21 ∧ [Link] = [Link])}
Domain
Relational Simi Verma 22 JAVA
Calculus Fname
Equivalence of David
Algebra and Q.1 Obtain the first name of students whose
Calculus
Varun
age is greater than 21. Simi
Module Summary

Database Management Systems Partha Pratim Das 17.15

TRC Example (2)

Module 17
Consider the relational schema
Partha Pratim
Das
student(rollNo, name, year , courseId)
course(courseId, cname, teacher )
Objectives &
Outline Q.2 Find out the names of all students who have taken the course name ‘DBMS’.
Predicate Logic

Tuple Relational • {t | ∃s ∈ student ∃c ∈ course([Link] = [Link] ∧ [Link] = ‘DBMS’ ∧[Link] = [Link])}

Calculus

Domain
Relational
• {[Link] | s ∈ student ∧ ∃c ∈ course([Link] = [Link] ∧ [Link]= ‘DBMS’ )}
Calculus

Equivalence of Q.3 Find out the names of all students and their rollNo who have taken the course name ‘DBMS’.
Algebra and
Calculus

Module Summary
• {[Link], [Link] | s ∈ student ∧ ∃c ∈ course([Link] = [Link] ∧ [Link] = ‘DBMS’ )}
• {t | ∃s ∈ student ∃c ∈ course([Link] = [Link] ∧ [Link] =‘DBMS’
∧[Link] = [Link] ∧ [Link] = [Link])}

Database Management Systems Partha Pratim Das 17.16

TRC Example (3)

Module 17
Consider the following relations:
Partha Pratim
Das
Flights(flno, from, to, distance, departs, arrives)
Aircraft(aid, aname, cruisingrange)
Objectives & Certified(eid, aid)
Outline
Employees(eid, ename, salary)
Predicate Logic

Tuple Relational Q.4. Find the eids of pilots certified for Boeing aircraft.
Calculus

Domain RA
Relational
Calculus
Πeid (σaname=‘Boeing 0 (Aircraft n
o Certified))
Equivalence of
TRC
Algebra and
Calculus
• {C .eid | C ∈ Certified ∧ ∃A ∈ Aircraft([Link] = C .aid ∧ [Link] = ‘Boeing’)}
Module Summary
• {T | ∃C ∈ Certified∃A ∈ Aircraft([Link] = C .aid ∧ [Link] = ‘Boeing’
∧T .eid = C .eid)}

Database Management Systems Partha Pratim Das 17.17

TRC Example (4)

Tuple Relational Q.5. Find the names and salaries of certified pilots working on Boeing aircrafts.
Calculus

Domain RA
Relational
Calculus
Πename,salary (σaname=‘Boeing ‘ (Aircraft n
o Certified n
o Employees))
Equivalence of
TRC
Algebra and {P | ∃E ∈ Employees ∃C ∈ Certified ∃A ∈ Aircraft([Link] = C .aid ∧ [Link]=
Calculus
‘Boeing’∧E .eid = C .eid ∧ [Link] = E .ename ∧ [Link] = E .salary )}
Module Summary

Database Management Systems Partha Pratim Das 17.18

TRC Example (5)

Tuple Relational Q.6 Identify the flights that can be piloted by every pilot whose salary is more than $100,000.
Calculus
(Hint: The pilot must be certified for at least one plane with a sufficiently large cruising range.)
Domain
Relational
Calculus

Equivalence of
• {F .flno | F ∈ Flights ∧ ∃A ∈ Aircraft∃C ∈ Certified∃E ∈ Employees([Link] >
Algebra and F .distance ∧ [Link] = C .aid ∧ E .salary > 100, 000 ∧ E .eid = C .eid)}
Calculus

Module Summary

Database Management Systems Partha Pratim Das 17.19

Safety of Expressions

Module 17

Partha Pratim • It is possible to write tuple calculus expressions that generate infinite relations
Das
• For example,{t | ¬t ∈ r } results in an infinite relation if the domain of any attribute of
Objectives &
Outline relation r is infinite
Predicate Logic
• To guard against the problem, we restrict the set of allowable expressions to safe
Tuple Relational
Calculus expressions
Domain
Relational
• An expression {t | P(t)} in the tuple relational calculus is safe if every component of t
Calculus
appears in one of the relations, tuples, or constants that appear in P.
Equivalence of
Algebra and ◦ NOTE: this is more than just a syntax condition
Calculus
◦ E.g. {t | t[A] = 5 ∨ true} is not safe — it defines an infinite set with attribute
Module Summary
values that do not appear in any relation or tuples or constants in P

Database Management Systems Partha Pratim Das 17.20

PPD

Module 17

Partha Pratim
Das

Objectives &
Outline

Predicate Logic

Tuple Relational
Calculus

Domain
Relational
Calculus

Equivalence of
Algebra and
Calculus

Module Summary Domain Relational Calculus

Database Management Systems Partha Pratim Das 17.21

Domain Relational Calculus

Module 17

Partha Pratim • A non-procedural query language equivalent in power to the tuple relational calculus
Das
• Each query is an expression of the form:
Objectives &
Outline

Predicate Logic {< x1 , x2 , . . . , xn > |P(x1 , x2 , . . . , xn )}

Tuple Relational
Calculus
◦ x1 , x2 , . . . , xn represent domain variables
Domain
Relational ◦ P represents a formula similar to that of the predicate calculus
Calculus

Equivalence of
Algebra and
Calculus

Module Summary

Equivalence of RA, TRC and DRC PPD

Module 17

Partha Pratim
Das

Objectives &
Outline

Predicate Logic

Tuple Relational
Calculus

Domain
Relational
Calculus

Equivalence of
Algebra and
Calculus

Module Summary

Source: [Link] [Link]

Database Management Systems Partha Pratim Das 17.31
Equivalence of RA, TRC and DRC PPD

Module 17

Partha Pratim
Das

Objectives &
Outline

Predicate Logic

Tuple Relational
Calculus

Domain
Relational
Calculus

Equivalence of
Algebra and
Calculus

Module Summary

Source: [Link] [Link]

Database Management Systems Partha Pratim Das 17.32

Module Summary

Module 17

Partha Pratim • Introduced tuple relational and domain relational calculus

Das
• Illustrated equivalence of algebra and calculus
Objectives &
Outline

Predicate Logic

Tuple Relational
Calculus

Domain
Relational
Calculus

Equivalence of
Algebra and
Calculus

Module Summary
Slides used in this presentation are borrowed from [Link] with kind
permission of the authors.
Edited and new slides are marked with “PPD”.

Database Management Systems Partha Pratim Das 17.33

Module 18

Partha Pratim
Das

Objectives &
Outline Database Management Systems
Design Process
Abstraction
Module 18: Entity-Relationship Model/1
Models
Design Approach

ER Model
Attributes
Entity Sets
Partha Pratim Das
Relationship
Cardinality
Constraints Department of Computer Science and Engineering
Weak Entity Sets Indian Institute of Technology, Kharagpur
Module Summary
ppd@[Link]

Database Management Systems Partha Pratim Das 18.1

Module Recap PPD

Module 18

Partha Pratim • Predicate Calculus

Das
• Tuple Relational and Domain Relational Calculus
Objectives &
Outline • Equivalence of Relational Algebra and Relational Calculus
Design Process
Abstraction
Models
Design Approach

ER Model
Attributes
Entity Sets
Relationship
Cardinality
Constraints
Weak Entity Sets

Module Summary

Database Management Systems Partha Pratim Das 18.2

Module Objectives PPD

Module 18

Partha Pratim • To understand the Design Process for Database Systems

Das
• To study the E-R Model for real world representation
Objectives &
Outline

Module Summary

Database Management Systems Partha Pratim Das 18.6

Role of Abstraction

Module 18 • Disorganized Complexity results from

Partha Pratim
Das
◦ Storage (STM) limitations of human brain – an individual can simultaneously
comprehend of the order of seven, plus or minus two chunks of information
Objectives &
Outline ◦ Speed limitations of human brain – it takes the mind about five seconds to accept a
Design Process new chunk of information
Abstraction
Models • Abstraction provides the major tool to handle Disorganized Complexity by chunking
Design Approach
information
ER Model
Attributes • Ignore inessential details, deal only with the generalized, idealized model of the world
Entity Sets
Relationship
Cardinality Consider: A binary number 110010101001
Constraints
Weak Entity Sets

Module Summary Hard to remembers. Right?

Try the octal form: (110)(010)(101)(001) ⇒ 6251

Or the hex form: (1100)(1010)(1001) ⇒ CA9

Database Management Systems Partha Pratim Das 18.7
Model Building

Module 18
• Physics • Electrical Circuits
Partha Pratim
Das
◦ Time-Distance Equation ◦ Kirchoff’s Loop Equations
Objectives & ◦ Quantum Mechanics ◦ Time Series Signals and FFT
Outline
◦ Transistor Models
Design Process • Chemistry
◦ Schematic Diagram
Abstraction
Models
◦ Valency-Bond Structures ◦ Interconnect Routing
Design Approach
• Geography • Building & Bridges
ER Model
Attributes ◦ Maps ◦ Drawings – Plan, Elevation, Side view
Entity Sets
Relationship
◦ Projections ◦ Finite Element Models
Cardinality
Constraints
Weak Entity Sets • Models are common in all engineering disciplines
Module Summary • Model building follows principles of decomposition, abstraction, and hierarchy
• Each model describes a specific aspect of the system
• Build new models upon old proven models

Database Management Systems Partha Pratim Das 18.8

Design Approach

Module 18 • Requirement Analysis: Analyse the data needs of the prospective database users
Partha Pratim
Das
◦ Planning
◦ System Definition
Objectives &
Outline • Database Designing: Use a modeling framework to create abstraction of the real world
Design Process
◦ Logical Model
Abstraction
Models ◦ Physical Model
Design Approach
• Implementation
ER Model
Attributes ◦ Data Conversion and Loading
Entity Sets
Relationship
◦ Testing
Cardinality
Constraints
Weak Entity Sets

Module Summary

Database Management Systems Partha Pratim Das 18.9

Design Approach (2): Database Designing

Module 18 • Logical Model: Deciding on a good database schema

Partha Pratim
Das
◦ Business Decision: What attributes should we record in the database?
◦ Computer Science Decision: What relation schema should we have and how should
Objectives &
Outline the attributes be distributed among the various relation schema?
Design Process • Physical Model: Deciding on the physical layout of the database
Abstraction
Models
Design Approach

ER Model
Attributes
Entity Sets
Relationship
Cardinality
Constraints
Weak Entity Sets

Module Summary

Database Management Systems Partha Pratim Das 18.10

Design Approach (3): Database Designing: Logical Model

Module 18

Partha Pratim • Entity Relationship Model

Das
◦ Models an enterprise as a collection of entities and relationships
Objectives &
Outline . Entity: A distinguishable “thing” or “object” in the enterprise
Design Process
Abstraction
− Described by a set of attributes
Models
Design Approach
. Relationship: An association among multiple entities
ER Model ◦ Represented by an Entity-Relationship or ER Diagram
Attributes
Entity Sets • Database Normalization (Chapter 8)
Relationship
Cardinality
◦ Formalize what designs are bad, and test for them
Constraints
Weak Entity Sets

Module Summary

Database Management Systems Partha Pratim Das 18.11

PPD

Module 18

Partha Pratim
Das

Objectives &
Outline

Design Process
Abstraction
Models
Design Approach

ER Model
Attributes
Entity Sets
Relationship
Cardinality
Constraints
Weak Entity Sets Entity Relationship (ER) Model
Module Summary

Database Management Systems Partha Pratim Das 18.12

ER Model: Database Modeling

Module 18

Partha Pratim • The ER data model was developed to facilitate database design by allowing specification
Das
of an enterprise schema that represents the overall logical structure of a database
Objectives &
Outline • The ER model is useful in mapping the meanings and interactions of real-world
Design Process enterprises onto a conceptual schema
Abstraction
Models • The ER data model employs three basic concepts:
Design Approach

ER Model
◦ Attributes
Attributes ◦ Entity sets
Entity Sets
Relationship ◦ Relationship sets
Cardinality
Constraints • The ER model also has an associated diagrammatic representation, the ER diagram,
Weak Entity Sets
which can express the overall logical structure of a database graphically
Module Summary

Database Management Systems Partha Pratim Das 18.13

Attributes

Module 18

Partha Pratim • An Attribute is a property associated with and entity / entity set. Based on the values
Das
of certain attributes, an entity can be identified uniquely
Objectives &
Outline • Attribute types:
Design Process ◦ Simple and Composite attributes
Abstraction
Models ◦ Single-valued and Multivalued attributes
Design Approach

ER Model
. Example: Multivalued attribute: phone numbers
Attributes ◦ Derived attributes
Entity Sets
Relationship . Can be computed from other attributes
Cardinality
Constraints . Example: age, given date of birth
Weak Entity Sets

Module Summary • Domain: Set of permitted values for each attribute

Database Management Systems Partha Pratim Das 18.14

Attributes (2): Composite

Module 18

Partha Pratim
Das

Objectives &
Outline

Design Process
Abstraction
Models
Design Approach

ER Model
Attributes
Entity Sets
Relationship
Cardinality
Constraints
Weak Entity Sets

Module Summary

Database Management Systems Partha Pratim Das 18.15

Entity Sets

Module 18

Partha Pratim • An entity is an object that exists and is distinguishable from other objects.
Das
◦ Example: specific person, company, event, plant
Objectives &
Outline • An entity set is a set of entities of the same type that share the same properties.
Design Process
Abstraction
◦ Example: set of all persons, companies, trees, holidays
Models
Design Approach
• An entity is represented by a set of attributes; i.e., descriptive properties possessed by
ER Model all members of an entity set.
Attributes
Entity Sets
◦ Example:
Relationship
instructor = (ID, name, street, city, salary )
Cardinality
Constraints course= (course id, title, credits)
Weak Entity Sets

Module Summary • A subset of the attributes form a primary key of the entity set; that is, uniquely
identifying each member of the set.
◦ Primary key of an entity set is represented by underlining it

Database Management Systems Partha Pratim Das 18.16

Entity Sets – instructor and student

Module 18

Partha Pratim
Das

Objectives &
Outline

Design Process
Abstraction
Models
Design Approach

ER Model
Attributes
Entity Sets
Relationship
Cardinality
Constraints
Weak Entity Sets

Module Summary

Database Management Systems Partha Pratim Das 18.17

Relationship Sets

Module 18

Partha Pratim • A relationship is an association among several entities

Das
Example:
Objectives &
Outline
44553 (Peltier) advisor 22222 (Einstein)
Design Process
student entity relationship set instructor entity
Abstraction
Models
• A relationship set is a mathematical relation among n ≥ 2 entities, each taken from
Design Approach
entity sets
ER Model
Attributes
{(e1 , e2 , . . . en ) | e1 ∈ E1 , e2 ∈ E2 , . . . , en ∈ En }
Entity Sets
Relationship where (e1 , e2 , . . . en ) is a relationship.
Cardinality
Constraints ◦ Example: (44553, 22222) ∈ advisor
Weak Entity Sets

Module Summary

Database Management Systems Partha Pratim Das 18.18

Relationship Set (2) advisor

Module 18

Partha Pratim
Das

Objectives &
Outline

Das
◦ instructor, with attributes: ID, name, dept name, salary
Objectives &
Outline ◦ department, with attributes: dept name, building, budget
Design Process • We model the fact that each instructor has an associated department using a
Abstraction
Models relationship set inst dept
Design Approach

ER Model
• The attribute dept name appears in both entity sets. Since it is the primary key for the
Attributes entity set department, it replicates information present in the relationship and is
Entity Sets
Relationship therefore redundant in the entity set instructor and needs to be removed
Cardinality
Constraints • BUT: When converting back to tables, in some cases the attribute gets reintroduced, as
Weak Entity Sets
we will see later
Module Summary

Database Management Systems Partha Pratim Das 18.22

Mapping Cardinality Constraints

Module 18

Partha Pratim • Express the number of entities to which another entity can be associated via a
Das
relationship set.
Objectives &
Outline • Most useful in describing binary relationship sets.
Design Process
Abstraction
• For a binary relationship set the mapping cardinality must be one of the following types:
Models
Design Approach
◦ One to one
ER Model
◦ One to many
Attributes ◦ Many to one
Entity Sets
Relationship ◦ Many to many
Cardinality
Constraints
Weak Entity Sets

Module Summary

Database Management Systems Partha Pratim Das 18.23

Mapping Cardinalities

Module 18

Partha Pratim
Das

Objectives &
Outline

Design Process
Abstraction
Models
Design Approach

ER Model
Attributes
Entity Sets
Relationship
Cardinality
Constraints
Weak Entity Sets

Module Summary

Note: Some elements in A and B may not be mapped to any elements in the other set
Database Management Systems Partha Pratim Das 18.24
Mapping Cardinalities

Module 18

Partha Pratim
Das

Objectives &
Outline

Design Process
Abstraction
Models
Design Approach

ER Model
Attributes
Entity Sets
Relationship
Cardinality
Constraints
Weak Entity Sets

Module Summary

Note: Some elements in A and B may not be mapped to any elements in the other set

Module Summary

Module 18

Partha Pratim • Introduced the Design Process for Database Systems

Das
• Elucidated the E-R Model for real world representation with entities, entity sets,
Objectives &
Outline attributes, and relationships
Design Process
Abstraction
Models
Design Approach

ER Model
Attributes
Entity Sets
Relationship
Cardinality
Constraints
Weak Entity Sets

Module Summary
Slides used in this presentation are borrowed from [Link] with kind
permission of the authors.
Edited and new slides are marked with “PPD”.

Database Management Systems Partha Pratim Das 18.30

Module 19

Partha Pratim
Das

Objectives &
Outline Database Management Systems
ER Diagram
Entity Sets
Module 19: Entity-Relationship Model/2
Relationship Sets
Cardinality
Constraints
Participation
Bounds
Partha Pratim Das
ER Model to
Relational
Schema
Department of Computer Science and Engineering
Entity Sets
Indian Institute of Technology, Kharagpur
Relationship
Composite Attributes
Multivalued
ppd@[Link]
Attributes
Redundancy

Module Summary

Database Management Systems Partha Pratim Das 19.1

Module Recap PPD

Module 19

Partha Pratim • Design Process for Database Systems

ER Model to
Relational
Schema
Entity Sets
Relationship
Composite Attributes
Multivalued
Attributes
Redundancy

Module Summary

Database Management Systems Partha Pratim Das 19.6

Relationship Sets

Module 19

Partha Pratim • Diamonds represent relationship sets.

Das

Objectives &
Outline

ER Diagram
Entity Sets
Relationship Sets
Cardinality
Constraints
Participation
Bounds

ER Model to
Relational
Schema
Entity Sets
Relationship
Composite Attributes
Multivalued
Attributes
Redundancy

Module Summary

Database Management Systems Partha Pratim Das 19.7

Relationship Sets with Attributes

Module 19

Partha Pratim
Das

Objectives &
Outline

ER Diagram
Entity Sets
Relationship Sets
Cardinality
Constraints
Participation
Bounds

ER Model to
Relational
Schema
Entity Sets
Relationship
Composite Attributes
Multivalued
Attributes
Redundancy

Module Summary

Database Management Systems Partha Pratim Das 19.8

Roles

Module 19

Partha Pratim • Entity sets of a relationship need not be distinct Each occurrence of an entity set plays
Das
a “role” in the relationship
Objectives &
Outline • The labels “course id” and “prereq id” are called roles.
ER Diagram
Entity Sets
Relationship Sets
Cardinality
Constraints
Participation
Bounds

ER Model to
Relational
Schema
Entity Sets
Relationship
Composite Attributes
Multivalued
Attributes
Redundancy

Module Summary

Database Management Systems Partha Pratim Das 19.9

Cardinality Constraints

Module 19

ER Model to
Relational
Schema
Entity Sets
Relationship
Composite Attributes
Multivalued
Attributes
Redundancy
Instructor can advise 0 or more students.
Module Summary
A student must have 1 advisor; cannot have multiple advisors

Database Management Systems Partha Pratim Das 19.15

Notation to Express Entity with Complex Attributes

Module 19

Partha Pratim
Das

Objectives &
Outline

ER Diagram
Entity Sets
Relationship Sets
Cardinality
Constraints
Participation
Bounds

ER Model to
Relational
Schema
Entity Sets
Relationship
Composite Attributes
Multivalued
Attributes
Redundancy

Reduction to Relation Schema

Module 19

Partha Pratim • Entity sets and relationship sets can be expressed uniformly as relation schemas that
Das
represent the contents of the database
Objectives &
Outline • A database which conforms to an ER diagram can be represented by a collection of
ER Diagram schemas
Entity Sets
Relationship Sets • For each entity set and relationship set there is a unique schema that is assigned the
Cardinality
Constraints name of the corresponding entity set or relationship set
Participation
Bounds • Each schema has a number of columns (generally corresponding to attributes), which
ER Model to
Relational
have unique names
Schema
Entity Sets
Relationship
Composite Attributes
Multivalued
Attributes
Redundancy

Module Summary

Database Management Systems Partha Pratim Das 19.20

Representing Entity Sets

Module 19

Partha Pratim • A strong entity set reduces to a schema with the same attributes
Das

Objectives &
student(ID, name, tot cred)
Outline

ER Diagram
• A weak entity set becomes a table that includes a column for the primary key of the
Entity Sets
Relationship Sets identifying strong entity set
Cardinality
Constraints
Participation section (course id, sec id, sem, year )
Bounds

ER Model to
Relational
Schema
Entity Sets
Relationship
Composite Attributes
Multivalued
Attributes
Redundancy

Module Summary

Database Management Systems Partha Pratim Das 19.21

Representing Relationship Sets

Module 19

Partha Pratim • A many-to-many relationship set is represented as a schema with attributes for the
Das
primary keys of the two participating entity sets, and any descriptive attributes of the
Objectives &
Outline
relationship set.
ER Diagram • Example: schema for relationship set advisor
Entity Sets
Relationship Sets
Cardinality
advisor = (s id, i id)
Constraints
Participation
Bounds

ER Model to
Relational
Schema
Entity Sets
Relationship
Composite Attributes
Multivalued
Attributes
Redundancy

Module Summary

Database Management Systems Partha Pratim Das 19.22

Representation of Entity Sets with Composite Attributes

Das
• Discussed translation of ER Models to Relational Schema
Objectives &
Outline

ER Diagram
Entity Sets
Relationship Sets
Cardinality
Constraints
Participation
Bounds

Non-binary Relationship Sets

Module 20

Partha Pratim • Most relationship sets are binary

Das
• There are occasions when it is more convenient to represent relationships as non-binary
Objectives &
Outline • ER Diagram with a Ternary Relationship
ER Features
Non-binary
Relationship
Specialization
Specialization as
Schema
Generalization
Aggregation

Design Issues
Entities vs Attributes
Entities vs
Relationship
Binary vs Non-Binary
Design Decisions
ER Notation

Module Summary

Database Management Systems Partha Pratim Das 20.6

Cardinality Constraints on Ternary Relationship

Module 20

Partha Pratim • We allow at most one arrow out of a ternary (or greater degree) relationship to indicate
Das
a cardinality constraint
Objectives &
Outline • For example, an arrow from proj guide to instructor indicates each student has at most
ER Features one guide for a project
Non-binary
Relationship
Specialization
• If there is more than one arrow, there are two ways of defining the meaning.
Specialization as
Schema
◦ For example, a ternary relationship R between A, B and C with arrows to B and C
Generalization could mean
Aggregation

Design Issues a) Each A entity is associated with a unique entity from B and C or
Entities vs Attributes
b) Each pair of entities from (A, B) is associated with a unique C entity, and each
Entities vs
Relationship pair (A ,C ) is associated with a unique B
Binary vs Non-Binary
Design Decisions ◦ Each alternative has been used in different formalisms
ER Notation

Module Summary
◦ To avoid confusion we outlaw more than one arrow

Database Management Systems Partha Pratim Das 20.7

Specialization: ISA

Design Issues
Entities vs Attributes
Entities vs
Relationship
Binary vs Non-Binary
Design Decisions
ER Notation

Module Summary

Database Management Systems Partha Pratim Das 20.12

Design Constraints on a Specialization / Generalization

Module 20

Partha Pratim
• Completeness constraint: Specifies whether or not an entity in the higher-level entity
Das
set must belong to at least one of the lower-level entity sets within a generalization
Objectives & ◦ total: an entity must belong to one of the lower-level entity sets
Outline
◦ partial: an entity need not belong to one of the lower-level entity sets
ER Features
Non-binary • Partial generalization is the default. We can specify total generalization in an ER
Relationship
Specialization diagram by adding the keyword total in the diagram and drawing a dashed line from
Specialization as
Schema the keyword to the corresponding hollow arrow-head to which it applies (for a total
Generalization
Aggregation
generalization), or to the set of hollow arrow-heads to which it applies (for an
Design Issues
overlapping generalization).
Entities vs Attributes
Entities vs
Relationship
• The student generalization is total. All student entities must
Binary vs Non-Binary be either graduate or undergraduate. Because the higher-
Design Decisions
ER Notation level entity set arrived at through generalization is generally
Module Summary composed of only those entities in the lower-level entity sets,
the completeness constraint for a generalized higher-level en-
tity set is usually total.
Database Management Systems Partha Pratim Das 20.13
Aggregation

Module 20 • Consider the ternary relationship proj guide, which we saw earlier
Partha Pratim
Das
• Suppose we want to record evaluations of a student by a guide on a project
Objectives &
Outline

ER Features
Non-binary
Relationship
Specialization
Specialization as
Schema
Generalization
Aggregation

Design Issues
Entities vs Attributes
Entities vs
Relationship
Binary vs Non-Binary
Design Decisions
ER Notation

Module Summary

Database Management Systems Partha Pratim Das 20.14

Aggregation (2)

Module 20
• Relationship sets eval for and proj guide represent overlapping information
Partha Pratim
Das ◦ Every eval for relationship corresponds to a proj guide relationship
Objectives &
◦ However, some proj guide relationships may not correspond to any eval for
Outline
relationships
ER Features
Non-binary . So we cannot discard the proj guide relationship
Relationship
Specialization • Eliminate this redundancy via aggregation
Specialization as
Schema
Generalization
◦ Treat relationship as an abstract entity
Aggregation ◦ Allows relationships between relationships
Design Issues ◦ Abstraction of relationship into new entity
Entities vs Attributes
Entities vs
Relationship
Binary vs Non-Binary
Design Decisions
ER Notation

Module Summary

Database Management Systems Partha Pratim Das 20.15

Aggregation (3)

Module 20
• Eliminate this redundancy via aggregation without introducing redundancy, the
Partha Pratim
Das
following diagram represents:
◦ A student is guided by a particular instructor on a particular project
Objectives &
Outline ◦ A student, instructor, project combination may have an associated evaluation
ER Features
Non-binary
Relationship
Specialization
Specialization as
Schema
Generalization
Aggregation

Design Issues
Entities vs Attributes
Entities vs
Relationship
Binary vs Non-Binary
Design Decisions
ER Notation

Module Summary

Database Management Systems Partha Pratim Das 20.16

Representing Aggregation via Schema

Module 20

Partha Pratim • To represent aggregation, create a schema containing

Das
◦ Primary key of the aggregated relationship,
Objectives &
Outline ◦ The primary key of the associated entity set
ER Features ◦ Any descriptive attributes
Non-binary
Relationship • In our example:
Specialization
Specialization as
Schema
◦ The schema
Generalization textiteval for is:
Aggregation
eval for (s ID, project id, i ID, evaluation id)
Design Issues
Entities vs Attributes ◦ The schema proj guide is redundant
Entities vs
Relationship
Binary vs Non-Binary
Design Decisions
ER Notation

Module Summary

Database Management Systems Partha Pratim Das 20.17

Design Issues PPD

Module 20

Partha Pratim
Das

Objectives &
Outline

ER Features
Non-binary
Relationship
Specialization
Specialization as
Schema
Generalization
Aggregation

Design Issues
Entities vs Attributes
Entities vs
Relationship
Design Issues
Binary vs Non-Binary
Design Decisions
ER Notation

Module Summary

Database Management Systems Partha Pratim Das 20.18

Entities vs. Attributes

Module 20
• Use of entity sets vs. attributes
Partha Pratim
Das

Objectives &
Outline

ER Features
Non-binary
Relationship
Specialization
Specialization as
Schema
Generalization • Use of phone as an entity allows extra information about phone numbers (plus multiple
Aggregation
phone numbers)
Design Issues
Entities vs Attributes
Entities vs
Relationship
Binary vs Non-Binary
Design Decisions
ER Notation

Module Summary

Database Management Systems Partha Pratim Das 20.19

Entities vs Relationship Sets

Module 20
• Use of entity sets vs. relationship sets
Partha Pratim
Das Possible guideline is to designate a relationship set to describe an action that occurs
Objectives &
between entities
Outline

ER Features
Non-binary
Relationship
Specialization
Specialization as
Schema
Generalization
Aggregation

Design Issues
Entities vs Attributes
Entities vs
Relationship
Binary vs Non-Binary
Design Decisions
• Placement of relationship attributes
ER Notation For example, attribute date as attribute of advisor or as attribute of student
Module Summary

Database Management Systems Partha Pratim Das 20.20

Binary vs Non-Binary Relationships

Module 20
• Although it is possible to replace any non-binary (n-ary, for n > 2) relationship set by a
Partha Pratim
Das number of distinct binary relationship sets, a n-ary relationship set shows more clearly
Objectives &
that several entities participate in a single relationship
Outline
• Some relationships that appear to be non-binary may be better represented using binary
ER Features
Non-binary
relationships
Relationship
Specialization ◦ For example, a ternary relationship parents, relating a child to his/her father and
Specialization as
Schema mother, is best replaced by two binary relationships, father and mother
Generalization
Aggregation . Using two binary relationships allows partial information (e.g., only mother being
Design Issues known)
Entities vs Attributes
Entities vs ◦ But there are some relationships that are naturally non-binary
Relationship
Binary vs Non-Binary . Example: proj guide
Design Decisions
ER Notation

Module Summary

Database Management Systems Partha Pratim Das 20.21

Binary vs Non-Binary Relationships (2): Conversion

Module 20 • In general, any non-binary relationship can be represented using binary relationships by
Partha Pratim
Das
creating an artificial entity set.
◦ Replace R between entity sets A, B and C by an entity set E, and three relationship
Objectives &
Outline sets:
ER Features 1. RA , relating E and A
Non-binary
Relationship 2. RB , relating E and B
Specialization
Specialization as
3. RC , relating E and C
Schema
Generalization
◦ Create an identifying attribute for E and add any attributes of R to E
Aggregation ◦ For each relationship (ai , bi , ci ) in R, create
Design Issues a) a new entity ei in the entity set E
Entities vs Attributes
Entities vs b) add (ei , ai ) to RA
Relationship
Binary vs Non-Binary c) add (ei , bi ) to RB
Design Decisions
ER Notation
d) add (ei , ci ) to RC
Module Summary

Database Management Systems Partha Pratim Das 20.22

Binary vs Non-Binary Relationships (3): Conversion

Database Management Systems Partha Pratim Das 20.27

Symbols Used in ER Notation (4): Alternates

Module 20
Chen IDE1FX (Crows feet notation)
Partha Pratim
Das

Objectives &
Outline

ER Features
Non-binary
Relationship
Specialization
Specialization as
Schema
Generalization
Aggregation

Design Issues
Entities vs Attributes
Entities vs
Relationship
Binary vs Non-Binary
Design Decisions
ER Notation

Module Summary

Database Management Systems Partha Pratim Das 20.28

Module Summary

Module 20

Partha Pratim • Discussed the extended features of ER Model

Das
• Deliberated on various design issues
Objectives &
Outline

ER Features
Non-binary
Relationship
Specialization
Specialization as
Schema
Generalization
Aggregation

Design Issues
Entities vs Attributes
Entities vs
Relationship
Binary vs Non-Binary
Design Decisions
ER Notation

Module Summary

Database Management Systems Partha Pratim Das 20.29

Module 41

Partha Pratim
Das

Week Recap

Objectives &
Database Management Systems
Outline
Module 41: Indexing and Hashing/1: Indexing/1
Indexing
Metrics

Ordered Indices
Dense Index Files
Sparse Index Files Partha Pratim Das
Primary and
Secondary Indices
Multilevel Index
Index Update
Department of Computer Science and Engineering
Indian Institute of Technology, Kharagpur
Module Summary
ppd@[Link]

Database Management Systems Partha Pratim Das 41.1

Week Recap PPD

Module 41
• Need for algorithm analysis, Asymptotic complexity, and Worst-case, average-case and
Partha Pratim
Das
best-case analysis
Week Recap
• Reviewed Linear Data Structures; array, list, stack, queue; and linear and binary search
Objectives & • Reviewed Non-linear Data Structures - graph, tree, hash table; Binary Search Tree; and
Outline

Indexing
compared Linear and Non-Linear Data Structures
Metrics
• Understood the range of Physical Storage Media
Ordered Indices
Dense Index Files • Studied about Magnetic Disks and Magnetic Tape
Sparse Index Files
Primary and
Secondary Indices
• Glimpsed through Other Storage and the Future of Storage
Multilevel Index
Index Update
• Familiarized with the organization for database files
Module Summary • Understood how records and relations are organized in files
• Learnt how databases keep their own information in Data-Dictionary Storage – the
metadata database of a database
• Understood the mechanisms for fast access of a database store

Database Management Systems Partha Pratim Das 41.2

Module Objectives PPD

Module 41

Partha Pratim • To understand the reasons for which we need to index database table
Das
• To learn about the ordered indexes and Indexed Sequential Access Mechanism
Week Recap

Objectives &
Outline

Indexing
Metrics

Ordered Indices
Dense Index Files
Sparse Index Files
Primary and
Secondary Indices
Multilevel Index
Index Update

Module Summary

Database Management Systems Partha Pratim Das 41.3

Module Outline PPD

Module 41

Partha Pratim • Basic Concepts of Indexing

Das
• Ordered Indices
Week Recap

Objectives &
Outline

Indexing
Metrics

Ordered Indices
Dense Index Files
Sparse Index Files
Primary and
Secondary Indices
Multilevel Index
Index Update

Module Summary

Database Management Systems Partha Pratim Das 41.4

Concepts of Indexing PPD

Module 41

Partha Pratim
Das

Week Recap

Objectives &
Outline

Indexing
Metrics

Ordered Indices
Dense Index Files
Sparse Index Files
Primary and
Secondary Indices
Multilevel Index
Index Update

Module Summary
Concepts of Indexing

Database Management Systems Partha Pratim Das 41.5

Search Records PPD

Module 41 • Consider a table: Faculty(Name, Phone)

Partha Pratim
Das

Week Recap

Objectives &
Outline

Indexing
Metrics

Ordered Indices
Dense Index Files
Sparse Index Files
Primary and
Secondary Indices
Multilevel Index • How to search on Name?
Index Update
◦ Get the phone number for ‘Pabitra Mitra’
Module Summary
◦ Use “Name” Index – sorted on ‘Name’, search ‘Pabitra Mitra’ and navigate on pointer (rec #)
• How to search on Phone?
◦ Get the name of the faculty having phone number = 84772
◦ Use “Phone” Index – sorted on ‘Phone’, search ‘84772’ and navigate on pointer (rec #)
• We can keep the records sorted on ‘Name’ or on ‘Phone’ (called the primary index), but not on both
Database Management Systems Partha Pratim Das 41.6
Basic Concepts

Module 41

Partha Pratim • Indexing mechanisms used to speed up access to desired data.

Das
◦ For example:
Week Recap

Objectives &
. Name in a faculty table
Outline . author catalog in library
Indexing
Metrics
• Search Key - attribute to set of attributes used to look up records in a file
Ordered Indices • An index file consists of records (called index entries) of the form
Dense Index Files
Sparse Index Files
Primary and search-key pointer
Secondary Indices
Multilevel Index
Index Update • Index files are typically much smaller than the original file
Module Summary
• Two basic kinds of indices:
◦ Ordered indices: search keys are stored in sorted order
◦ Hash indices: search keys are distributed uniformly across buckets using a hash
function

Database Management Systems Partha Pratim Das 41.7

Index Evaluation Metrics

Module 41

Partha Pratim • Access types supported efficiently. For example,

Das
◦ records with a specified value in the attribute, or
Week Recap
◦ records with an attribute value falling in a specified range of values
Objectives &
Outline • Access time
Indexing
Metrics • Insertion time
Ordered Indices
Dense Index Files
• Deletion time
Sparse Index Files
Primary and
• Space overhead
Secondary Indices
Multilevel Index
Index Update

Module Summary

Database Management Systems Partha Pratim Das 41.8

Ordered Indices

Module 41

Partha Pratim
Das

Week Recap

Objectives &
Outline

Indexing
Metrics

Ordered Indices
Dense Index Files
Sparse Index Files
Primary and
Secondary Indices
Multilevel Index
Index Update

Module Summary
Ordered Indices

Database Management Systems Partha Pratim Das 41.9

Ordered Indices

Module 41

Partha Pratim • In an ordered index, index entries are stored sorted on the search key value. For
Das
example, author catalog in library
Week Recap
• Primary index: in a sequentially ordered file, the index whose search key specifies the
Objectives &
Outline sequential order of the file
Indexing
Metrics
◦ Also called clustering index
Ordered Indices
◦ The search key of a primary index is usually but not necessarily the primary key
Dense Index Files
Sparse Index Files
• Secondary index: an index whose search key specifies an order different from the
Primary and
Secondary Indices
sequential order of the file
Multilevel Index
Index Update
◦ Also called non-clustering index
Module Summary • Index-sequential file: ordered sequential file with a primary index

Database Management Systems Partha Pratim Das 41.10

Dense Index Files

Module 41

Partha Pratim • Dense index — Index record appears for every search-key value in the file.
Das
• For example, index on ID attribute of instructor relation
Week Recap

Objectives &
Outline

Indexing
Metrics

Ordered Indices
Dense Index Files
Sparse Index Files
Primary and
Secondary Indices
Multilevel Index
Index Update

Module Summary

Database Management Systems Partha Pratim Das 41.11

Dense Index Files (2)

Module 41

Partha Pratim • Dense index on dept name, with instructor file sorted on dept name
Das

Week Recap

Objectives &
Outline

Indexing
Metrics

Ordered Indices
Dense Index Files
Sparse Index Files
Primary and
Secondary Indices
Multilevel Index
Index Update

Module Summary

Database Management Systems Partha Pratim Das 41.12

Sparse Index Files

Module 41

Partha Pratim • Sparse Index: contains index records for only some search-key values.
Das
◦ Applicable when records are sequentially ordered on search-key
Week Recap

Objectives &
• To locate a record with search-key value K we:
Outline
◦ Find index record with largest search-key value < K
Indexing
Metrics
◦ Search file sequentially starting at the record to which the index record points
Ordered Indices
Dense Index Files
Sparse Index Files
Primary and
Secondary Indices
Multilevel Index
Index Update

Module Summary

Database Management Systems Partha Pratim Das 41.13

Sparse Index Files (2)

Module 41

Partha Pratim • Compared to dense indices:

Das
◦ Less space and less maintenance overhead for insertions and deletions
Week Recap
◦ Generally slower than dense index for locating records
Objectives &
Outline • Good tradeoff: sparse index with an index entry for every block in file, corresponding
Indexing
Metrics
to least search-key value in the block
Ordered Indices
Dense Index Files
Sparse Index Files
Primary and
Secondary Indices
Multilevel Index
Index Update

Module Summary

Database Management Systems Partha Pratim Das 41.14

Secondary Indices Example

Module 41

Partha Pratim
Das

Week Recap

Objectives &
Outline

Indexing
Metrics

Ordered Indices
Dense Index Files
Sparse Index Files
Primary and
Secondary Indices
Multilevel Index
Index Update

Module Summary
Secondary index on salary field of instructor

• Index record points to a bucket that contains pointers to all the actual records with
that particular search-key value.
• Secondary indices have to be dense
Database Management Systems Partha Pratim Das 41.15
Primary and Secondary Indices

Module 41

Partha Pratim • Indices offer substantial benefits when searching for records
Das
• BUT: Updating indices imposes overhead on database modification –when a file is
Week Recap
modified, every index on the file must be updated
Objectives &
Outline
• Sequential scan using primary index is efficient, but a sequential scan using a secondary
Indexing
Metrics
index is expensive
Ordered Indices ◦ Each record access may fetch a new block from disk
Dense Index Files
Sparse Index Files
◦ Block fetch requires about 5 to 10 milliseconds, versus about 100 nanoseconds for
Primary and
Secondary Indices
memory access
Multilevel Index
Index Update

Module Summary

Database Management Systems Partha Pratim Das 41.16

Multilevel Index

Module 41

Partha Pratim • If primary index does not fit in memory, access becomes expensive
Das
• Solution: treat primary index kept on disk as a sequential file and construct a sparse
Week Recap
index on it
Objectives &
Outline ◦ outer index – a sparse index of primary index
Indexing
Metrics
◦ inner index – the primary index file
Ordered Indices • If even outer index is too large to fit in main memory, yet another level of index can be
Dense Index Files
Sparse Index Files
created, and so on
Primary and
Secondary Indices • Indices at all levels must be updated on insertion or deletion from the file
Multilevel Index
Index Update

Module Summary

Database Management Systems Partha Pratim Das 41.17

Multilevel Index (2)

Module 41

Partha Pratim
Das

Week Recap

Objectives &
Outline

Indexing
Metrics

Ordered Indices
Dense Index Files
Sparse Index Files
Primary and
Secondary Indices
Multilevel Index
Index Update

Module Summary

Database Management Systems Partha Pratim Das 41.18

Index Update: Deletion

Module 41

Partha Pratim
Das
• If deleted record was the only
Week Recap record in the file with its partic-
Objectives &
Outline
ular search-key value, the search-
Indexing
key is deleted from the index also.
Metrics

Ordered Indices
Dense Index Files • Single-level index entry deletion:
Sparse Index Files
Primary and
Secondary Indices
◦ Dense indices – deletion of search-key is similar to file record deletion
Multilevel Index ◦ Sparse indices –
Index Update

Module Summary
. If an entry for the search key exists in the index, it is deleted by replacing the
entry in the index with the next search-key value in the file (in search-key order)
. If the next search-key value already has an index entry, the entry is deleted
instead of being replaced

Database Management Systems Partha Pratim Das 41.19

Index Update (2): Insertion

Module 41

Partha Pratim • Single-level index insertion:

Das
◦ Perform a lookup using the search-key value appearing in the record to be inserted
Week Recap
◦ Dense indices – if the search-key value does not appear in the index, insert it
Objectives &
Outline ◦ Sparse indices – if index stores an entry for each block of the file, no change needs
Indexing to be made to the index unless a new block is created
Metrics

Ordered Indices
. If a new block is created, the first search-key value appearing in the new block is
Dense Index Files inserted into the index
Sparse Index Files
Primary and
Secondary Indices
• Multilevel insertion and deletion: algorithms are simple extensions of the single-level
Multilevel Index algorithms
Index Update

Module Summary

Database Management Systems Partha Pratim Das 41.20

Secondary Indices

Module 41

Partha Pratim • Frequently, one wants to find all the records whose values in a certain field (which is
Das
not the search-key of the primary index) satisfy some condition
Week Recap
◦ Example 1: In the instructor relation stored sequentially by ID, we may want to find
Objectives &
Outline all instructors in a particular department
Indexing ◦ Example 2: as above, but where we want to find all instructors with a specified
Metrics
salary or with salary in a specified range of values
Ordered Indices
Dense Index Files • We can have a secondary index with an index record for each search-key value
Sparse Index Files
Primary and
Secondary Indices
Multilevel Index
Index Update

Module Summary

Database Management Systems Partha Pratim Das 41.21

Module Summary

Module 41

Partha Pratim • Appreciated the reasons for indexing database tables

Das
• Understood the ordered indexes
Week Recap

Objectives &
Outline

Indexing
Metrics

Ordered Indices
Dense Index Files
Sparse Index Files
Primary and
Secondary Indices
Multilevel Index
Index Update

Module Summary
Slides used in this presentation are borrowed from [Link] with kind
permission of the authors.
Edited and new slides are marked with “PPD”.

Database Management Systems Partha Pratim Das 41.22

Module 42

Partha Pratim
Das

Objectives &
Outline Database Management Systems
Balanced BST
Module 42: Indexing and Hashing/2: Indexing/2
2-3-4 Tree
Search
Insert
Split
Example
Delete
Partha Pratim Das
Observations

Module Summary Department of Computer Science and Engineering

Indian Institute of Technology, Kharagpur

ppd@[Link]

Database Management Systems Partha Pratim Das 42.1

Module Recap PPD

Module 42

Partha Pratim • Appreciated the reasons for indexing database tables

Das
• Understood the ordered indexes
Objectives &
Outline

Balanced BST

Database Management Systems Partha Pratim Das 42.5

Search Data Structures PPD

Module 42
• How to search a key in a list of n data items?
Partha Pratim
◦ Linear Search: O(n): Find 28 ⇒ 16 comparisons
Das . Unordered items in an array – search sequentially
Objectives &
. Unordered / Ordered items in a list – search sequentially
Outline

Balanced BST

2-3-4 Tree
◦ Binary Search: O(lg n): Find 28 ⇒ 4 comparisons – 25, 36, 30, 28
Search . Ordered items in an array – search by divide-and-conquer
Insert
Split
Example
Delete
. Binary Search Tree – recursively on left / right
Observations

Module Summary

Database Management Systems Partha Pratim Das 42.6

Search Data Structures (2) PPD

Module 42

Partha Pratim • Worst case time (n data items in the data structure):
Das

Objectives &
Outline

Balanced BST

2-3-4 Tree
Search
Insert
Split
Example
Delete
Observations

Module Summary

• Between an array and a list, there is a trade-off between search and insert/delete
complexity
• For a BST of n nodes, lg n ≤ h < n, where h is the height of the tree
• A BST is balanced if h ∼ O(lg n): this what we desire

Database Management Systems Partha Pratim Das 42.7

Search Data Structures (3): BST PPD

Module 42
• In the worst case, searching a key in a BST is O(h), where h is the height of the key
Partha Pratim
Das • Bad Tree: h ∼ O(n)
Objectives & ◦ The BST is a skewed binary search tree (all the nodes except the leaf would have
Outline
only one child)
Balanced BST
◦ This can happen if keys are inserted in sorted order
2-3-4 Tree
Search ◦ Height (h) of the BST having n elements becomes n − 1
Insert
Split
◦ Time complexity of search in BST becomes O(n)
Example
Delete
• Good Tree: h ∼ O(lg n)
Observations
◦ The BST is a balanced binary search tree
Module Summary
◦ This is possible if
. If keys are inserted in purely randomized order, Or
. If the tree is explicitly balanced after every insertion
◦ Height (h) of the binary search tree becomes lg n
◦ Time complexity of search in BST becomes O(lg n)

Database Management Systems Partha Pratim Das 42.8

Balanced Binary Search Trees PPD

Module 42
• A BST is balanced if h ∼ O(lg n)
Partha Pratim
Das • Balancing Guarantees may be of various types:
Objectives & ◦ Worst-case
Outline

Balanced BST
. AVL Tree: Self-balancing BST
2-3-4 Tree
− Named after inventors Adelson-Velsky-Landis
Search − Heights of the two child subtrees of any node differ by at most one: |hL − hR | ≤ 1
Insert − If they differ by more than one, rebalancing is done rotation
Split
Example ◦ Randomized
Delete
Observations . Randomized BST
Module Summary − A BST on n keys is random if either it is empty (n = 0), or the probability that a given
1
key is at the root is n
, and the left and right subtrees are random
. Skip List
− A skip list is built (probabbilistically) in layers of ordered linked lists
◦ Amortized
. Splay
− A BST where recently accessed elements are quick to access again
Database Management Systems Partha Pratim Das 42.9
Balanced Binary Search Trees (2) PPD

Module 42
• These data structures have optimal complexity for the required operations:
Partha Pratim
Das ◦ Search: O(lg n)
Objectives &
◦ Insert: Search + O(1): O(lg n)
Outline ◦ Delete: Search + O(1): O(lg n)
Balanced BST
• And they are:
2-3-4 Tree
Search ◦ Good for in-memory operations
Insert
Split ◦ Work well for small volume of data
Example
Delete
◦ Has complex rotation and / or similar operations
Observations ◦ Do not scale for external data structures
Module Summary

Database Management Systems Partha Pratim Das 42.10

2-3-4 Tree PPD

Module 42

Partha Pratim
Das

Objectives &
Outline

Balanced BST

2-3-4 Tree
Search
Insert
Split
Example
Delete
Observations

Module Summary

2-3-4 Tree

Database Management Systems Partha Pratim Das 42.11

2-3-4 Trees PPD

Das

Objectives &
Outline

Balanced BST

2-3-4 Tree
Search
Insert
Split
Example
Delete
Observations

Module Summary

Database Management Systems Partha Pratim Das 42.16

2-3-4 Trees: Insert PPD

Module 42

Partha Pratim • Splitting with 2 Node parent

Das

Objectives &
Outline

Balanced BST

2-3-4 Tree
Search
Insert
Split
Example
Delete
Observations

Module Summary

Database Management Systems Partha Pratim Das 42.17

2-3-4 Trees: Insert PPD

Module 42 • Splitting with 3 Node parent

Partha Pratim
Das

Objectives &
Outline

Balanced BST

2-3-4 Tree
Search
Insert
Split
Example
Delete
Observations

Module Summary

Database Management Systems Partha Pratim Das 42.18

2-3-4 Trees: Insert

Module 42

2-3-4 Trees: Insert: Example PPD

Module 42

Partha Pratim • 10, 30, 60, 20, 50, 40, 70, 80, 15, 90, 100
Das

Objectives &
Outline

Balanced BST

2-3-4 Tree
Search
Insert
Split
Example
Delete
Observations

Module Summary

Database Management Systems Partha Pratim Das 42.26

2-3-4 Trees: Delete PPD

Module 42

Partha Pratim • Delete

Das
◦ Locate the node n that contains the item theItem
Objectives &
Outline ◦ Find theItem’s inorder successor and swap it with theItem (deletion will always be
Balanced BST at a leaf)
2-3-4 Tree ◦ If that leaf is a 3-node or a 4-node, remove theItem
Search
Insert
◦ To ensure that theItem does not occur in a 2-node
Split
Example
. Transform each 2-node encountered into a 3-node or a 4-node
Delete . Reverse different cases illustrated for splitting
Observations

Module Summary

Database Management Systems Partha Pratim Das 42.27

2-3-4 Tree PPD

Module 42

Partha Pratim • Advantages

Das
◦ All leaves are at the same depth (the bottom level): Height, h ∼ O(lg n)
Objectives &
Outline ◦ Complexity of search, insert and delete: O(h) ∼ O(lg n)
Balanced BST ◦ All data is kept in sorted order
2-3-4 Tree ◦ Generalizes easily to larger nodes
Search
Insert
◦ Extends to external data structures
Split
Example
• Disadvantages
Delete
Observations
◦ Uses variety of node types – need to destruct and construct multiple nodes for
Module Summary converting a 2 Node to 3 Node, a 3 Node to 4 Node, for splitting etc.

Database Management Systems Partha Pratim Das 42.28

2-3-4 Trees PPD

Module 42
• Consider only one node type with space for 3 items and 4 links
Partha Pratim
Das ◦ Internal node (non-root) has 2 to 4 children (links)
Objectives & ◦ Leaf node has 1 to 3 items
Outline
◦ Wastes some space, but has several advantages for external data structure
Balanced BST

2-3-4 Tree • Generalizes easily to larger nodes

Search
Insert
◦ All paths from root to leaf are of the same length
n
Split ◦ Each node that is not a root
l or ma leaf has between 2 and n children.
Example
Delete
Observations
◦ A leaf node has between (n−1) 2 and n − 1 values
Module Summary ◦ Special cases:
. If the root is not a leaf, it has at least 2 children.
. If the root is a leaf, it can have between 0 and (n − 1) values.
• Extends to external data structures
◦ B-Tree
◦ 2-3-4 Tree is a B-Tree where n = 4
Database Management Systems Partha Pratim Das 42.29
Module Summary

Module 42

Partha Pratim • Recapitulated the notions of Balanced Binary Search Trees as options for optimal
Das
in-memory search data structures
Objectives &
Outline • Understood the issues relating to external data structures for persistent data
Balanced BST
• Explored 2-3-4 Tree in depth as a precursor to B/B+-Tree for an efficient external data
2-3-4 Tree
Search
structure for database and index tables
Insert
Split
Example
Delete
Observations

Module Summary

Slides used in this presentation are borrowed from [Link] with kind
permission of the authors.
Edited and new slides are marked with “PPD”.

Database Management Systems Partha Pratim Das 42.30

Module 43

Partha Pratim
Das

Objectives &
Outline Database Management Systems
B+-Tree Index
Files Module 43: Indexing and Hashing/3: Indexing/3
+
Simple B Tree
Index Files
Nodes
Observations
Query
Duplicates
Partha Pratim Das
Updates
Insertion
Department of Computer Science and Engineering
Deletion
Indian Institute of Technology, Kharagpur
File Organization
Non-Unique Keys
Relocation and
ppd@[Link]
Secondary Indices
Strings

B-Tree Index
Files
Comparison

Module Summary
Database Management Systems Partha Pratim Das 43.1
Module Recap PPD

Module 43

Partha Pratim • Recapitulated the notions of Balanced Binary Search Trees as options for optimal
Das
in-memory search data structures
Objectives &
Outline • Understood the issues relating to external data structures for persistent data
B+-Tree Index
Files
• Explored 2-3-4 Tree in depth as a precursor to B/B+-Tree for an efficient external data
Simple B
+
Tree structure for database and index tables
Index Files
Nodes
Observations
Query
Duplicates
Updates
Insertion
Deletion
File Organization
Non-Unique Keys
Relocation and
Secondary Indices
Strings

B-Tree Index
Files
Comparison

Module Summary
Database Management Systems Partha Pratim Das 43.2
Module Objectives PPD

Module 43

Partha Pratim • To understand the design of B+ Tree Index Files as a generalization of 2-3-4 Tree
Das
• To understand the fundamentals of B-Tree Index Files
Objectives &
Outline

B+-Tree Index
Files
+
Simple B Tree
Index Files
Nodes
Observations
Query
Duplicates
Updates
Insertion
Deletion
File Organization
Non-Unique Keys
Relocation and
Secondary Indices
Strings

B-Tree Index
Files
Comparison

Module Summary
Database Management Systems Partha Pratim Das 43.3
Module Outline PPD

Module 43

Partha Pratim • B+ Tree Index Files

Das
• B-Tree Index Files
Objectives &
Outline

B+-Tree Index
Files
+
Simple B Tree
Index Files
Nodes
Observations
Query
Duplicates
Updates
Insertion
Deletion
File Organization
Non-Unique Keys
Relocation and
Secondary Indices
Strings

B-Tree Index
Files
Comparison

Module Summary
Database Management Systems Partha Pratim Das 43.4
B+ Tree Index Files PPD

Module 43

Partha Pratim
Das

Objectives &
Outline

B+-Tree Index
Files
+
Simple B Tree
Index Files
Nodes
Observations
Query
Duplicates
Updates
Insertion
Deletion

B+ Tree Index Files

File Organization
Non-Unique Keys
Relocation and
Secondary Indices
Strings

B-Tree Index
Files
Comparison

Module Summary
Database Management Systems Partha Pratim Das 43.5
B+ Tree

Module 43 The B+ Tree

Partha Pratim
Das
• Is a balanced binary search tree
Objectives &
◦ Follows a multi-level index format like 2-3-4 Tree
Outline
• Has the leaf nodes denoting actual data pointers
B+-Tree Index
Files
Simple B
+
Tree
• Ensures that all leaf nodes remain at the same height (like 2-3-4 Tree)
Index Files
Nodes
• Has the leaf nodes are linked using a link list
Observations
◦ Can support random access as well as sequential access
Query
Duplicates
Updates
• Example:
Insertion
Deletion
File Organization
Non-Unique Keys
Relocation and
Secondary Indices
Strings

B-Tree Index
Files
Comparison
Source: B+ Tree
Module Summary
Database Management Systems Partha Pratim Das 43.6
B+ Tree (2)

Module 43

Partha Pratim
Das

Objectives &
Outline

B+-Tree Index
Files
+
Simple B Tree
Index Files
Nodes
• Internal node contains
Observations
Query
◦ At least n2 child pointers, except the root node
Duplicates ◦ At most n pointers Note: These are approximate
Updates
Insertion • Leaf node contains values, we will discuss more
Deletion
File Organization ◦ At least n2 record pointers and n2 key values precise values later in this lecture.
Non-Unique Keys
Relocation and ◦ At most n record pointer and n key values
Secondary Indices
Strings
◦ One block pointer P to point to next leaf node
B-Tree Index
Files
Source: B+ Tree
Comparison

Module Summary
Database Management Systems Partha Pratim Das 43.7
B+ Tree (3): Search

Module 43
• Suppose we have to search 55 in the B+ tree below
Partha Pratim
Das ◦ First, we will fetch for the intermediary node which will direct to the leaf node that
Objectives &
can contain a record for 55
Outline
• So, in the intermediary node, we will find a branch between 50 and 75 nodes
B+-Tree Index
Files
+
◦ Then at the end, we will be redirected to the third leaf node
Simple B Tree
Index Files ◦ Here DBMS will perform a sequential search to find 55
Nodes
Observations
Query
Duplicates
Updates
Insertion
Deletion
File Organization
Non-Unique Keys
Relocation and
Secondary Indices
Strings Source: B+ Tree

B-Tree Index
Files
Comparison

Module Summary
Database Management Systems Partha Pratim Das 43.8
B+ Tree (3): Insert

Module 43

Partha Pratim
Das

Objectives &
Outline

B+-Tree Index
Files
+
Simple B Tree
Index Files
Nodes
Observations
Query • Suppose we want to insert a record 60 that goes to 3rd leaf node after 55
Duplicates
Updates • The leaf node of this tree is already full, so we cannot insert 60 there
Insertion
Deletion • So we have to split the leaf node, so that it can be inserted into tree without affecting
File Organization
Non-Unique Keys
the fill factor, balance and order
• The 3rd leaf node has the values (50, 55, 60, 65, 70) and its current root node is 50
Relocation and
Secondary Indices
Strings

B-Tree Index
• We will split the leaf node of the tree in the middle so that its balance is not altered
Files
Comparison Source: B+ Tree

Module Summary
Database Management Systems Partha Pratim Das 43.9
B+ Tree (4): Insert

Module 43

Partha Pratim
Das

Objectives &
Outline

B+-Tree Index
Files
+
Simple B Tree
Index Files
Nodes
Observations
Query • So we can group (50, 55) and (60, 65, 70) into 2 leaf nodes
Duplicates
Updates • If these two has to be leaf nodes, the intermediate node cannot branch from 50
Insertion
Deletion • It should have 60 added to it, and then we can have pointers to a new leaf node
File Organization
Non-Unique Keys • This is how we can insert an entry when there is overflow. In a normal scenario, it is
Relocation and
Secondary Indices very easy to find the node where it fits and then place it in that leaf node
Strings
Source: B+ Tree
B-Tree Index
Files
Comparison

Module Summary
Database Management Systems Partha Pratim Das 43.10
B+ Tree (5): Delete

Module 43

Partha Pratim
Das

Objectives &
Outline

B+-Tree Index
Files
+
Simple B Tree
Index Files
Nodes
Observations • To delete 60, we have to remove 60 from intermediate node as well as 4th leaf node
Query
Duplicates
• If we remove it from the intermediate node, then the tree will not remain a B+ tree
Updates
Insertion
Deletion
• So with deleting 60 we re-arranging the nodes:
File Organization
Non-Unique Keys
Relocation and
Secondary Indices
Strings

B-Tree Index
Files
Comparison
Source: B+ Tree
Module Summary
Database Management Systems Partha Pratim Das 43.11
B+ Tree Index Files

Module 43

Partha Pratim • B+ tree indices are an alternative to indexed-sequential files

Das
• Disadvantage of ISAM files
Objectives &
Outline ◦ Performance degrades as file grows, since many overflow blocks get created
B+-Tree Index
Files
◦ Periodic reorganization of entire file is required
• Advantage of B+ tree index files:
+
Simple B Tree
Index Files
Nodes
Observations
◦ Automatically reorganizes itself with small, local, changes, in the face of insertions
Query and deletions
Duplicates
Updates
◦ Reorganization of entire file is not required to maintain performance
• (Minor) disadvantage of B+ trees:
Insertion
Deletion
File Organization
Non-Unique Keys
◦ Extra insertion and deletion overhead, space overhead
• Advantages of B+ trees outweigh disadvantages
Relocation and
Secondary Indices
Strings

B-Tree Index
◦ B+ trees are used extensively
Files
Comparison

Module Summary
Database Management Systems Partha Pratim Das 43.12
B+ Tree Index Files (2): Example

Module 43

Partha Pratim
Das

Objectives &
Outline

B+-Tree Index
Files
+
Simple B Tree
Index Files
Nodes
Observations
Query
Duplicates
Updates
Insertion
Deletion
File Organization
Non-Unique Keys
Relocation and
Secondary Indices
Strings

B-Tree Index
Files
Comparison

Module Summary
Database Management Systems Partha Pratim Das 43.13
B+ Tree Index Files (3): Structure

Module 43

Partha Pratim
A B+ tree is a rooted tree satisfying the following properties:
Das
• All paths from root to leaf are of the same length
• Each node that is not a root or a leaf has between d n2 e and n children
Objectives &
Outline

B+-Tree Index
Files
• A leaf node has between an d n−1
2 e and n − 1 values
+
Simple B
Index Files
Tree
• Special cases:
Nodes
Observations
◦ If the root is not a leaf, it has at least 2 children.
Query ◦ If the root is a leaf (that is, there are no other nodes in the tree), it can have
Duplicates
Updates between 0 and (n − 1) values.
Insertion
Deletion
File Organization
Non-Unique Keys
Relocation and
Secondary Indices
Strings

B-Tree Index
Files
Comparison

Module Summary
Database Management Systems Partha Pratim Das 43.14
B+ Tree Index Files (4): Node Structure

Module 43

Partha Pratim • Typical node

B-Tree Index
Files
Comparison

Module Summary
Database Management Systems Partha Pratim Das 43.30
Updates on B+ Trees: Deletion Example

Module 43

Partha Pratim
Das

Objectives &
Outline

B+-Tree Index
Files
+
Simple B Tree
Index Files
Before and after deleting “Srinivasan”
Nodes
Observations
Query
Duplicates
Updates
Insertion
Deletion
File Organization
Non-Unique Keys
Relocation and
Secondary Indices
Strings
Deleting “Srinivasan” causes merging of under-full leaves
B-Tree Index
Files
Comparison

Module Summary
Database Management Systems Partha Pratim Das 43.31
Updates on B+ Trees: Deletion Example (2)

Module 43

Partha Pratim
Das

Objectives &
Outline

B+-Tree Index
Files
+
Simple B Tree
Index Files
Nodes
Deletion of “Singh” and “Wu” from result of previous example
Observations
Query
Duplicates • Leaf containing Singh and Wu became underfull, and borrowed a value Kim from its
Updates
Insertion left sibling
Deletion
File Organization • Search-key value in the parent changes as a result
Non-Unique Keys
Relocation and
Secondary Indices
Strings

B-Tree Index
Files
Comparison

Module Summary
Database Management Systems Partha Pratim Das 43.32
Updates on B+ Trees: Deletion Example (3)

Module 43

Partha Pratim
Das

Objectives &
Outline

B+-Tree Index
Files
+
Simple B Tree
Index Files
Nodes
Observations
Query
Duplicates
Updates
Insertion
Deletion
File Organization Before and after deletion of “Gold” from earlier example
Non-Unique Keys
Relocation and
Secondary Indices • Node with “Gold” and “Katz” became underfull, and was merged with its sibling
Strings

B-Tree Index
• Parent node becomes underfull, and is merged with its sibling
Files
Comparison
◦ Value separating two nodes (at the parent) is pulled down when merging
Module Summary • Root node then has only one child, and is delete
Database Management Systems Partha Pratim Das 43.33
B+ Tree File Organization

Module 43

Partha Pratim • Index file degradation problem is solved by using B+ Tree indices
Das
• Data file degradation problem is solved by using B+ Tree File Organization
Objectives &
Outline • The leaf nodes in a B+ tree file organization store records, instead of pointers
B+-Tree Index
Files • Leaf nodes are still required to be half full
+
Simple B Tree
Index Files ◦ Since records are larger than pointers, the maximum number of records that can be
Nodes
Observations
stored in a leaf node is less than the number of pointers in a non-leaf node
Query
Duplicates
• Insertion and deletion are handled in the same way as insertion and deletion of entries
Updates in a B+ tree index
Insertion
Deletion
File Organization
Non-Unique Keys
Relocation and
Secondary Indices
Strings

B-Tree Index
Files
Comparison

Module Summary
Database Management Systems Partha Pratim Das 43.34
B+ Tree File Organization: Example

Module 43

Partha Pratim
Das

Objectives &
Outline

B+-Tree Index
Files
+
Simple B Tree
Index Files
Nodes
Observations
Query
Duplicates
Updates
Example of B+ tree File Organization
Insertion
Deletion
File Organization • Good space utilization important since records use more space than pointers.
Non-Unique Keys
Relocation and
Secondary Indices
• To improve space utilization, involve more sibling nodes in redistribution during splits
Strings and merges
B-Tree Index
Files
◦ Involving 2 siblings in redistribution
(to avoid split / merge where possible) results
Comparison in each node having at least 2n 3 entries
Module Summary
Database Management Systems Partha Pratim Das 43.35
Non-Unique Search Keys

Module 43

Partha Pratim • Alternatives to scheme described earlier

Das
◦ Buckets on separate block (bad idea)
Objectives &
Outline ◦ List of tuple pointers with each key
B+-Tree Index
Files
. Extra code to handle long lists
Simple B
+
Tree . Deletion of a tuple can be expensive if there are many duplicates on search key
Index Files
Nodes
(why?)
Observations . Low space overhead, no extra cost for queries
Query
Duplicates ◦ Make search key unique by adding a record-identifier
Updates
Insertion . Extra storage overhead for keys
Deletion
File Organization
. Simpler code for insertion/deletion
Non-Unique Keys
Relocation and
. Widely used
Secondary Indices
Strings

B-Tree Index
Files
Comparison

Module Summary
Database Management Systems Partha Pratim Das 43.36
Record Relocation and Secondary Indices

Module 43

Partha Pratim • If a record moves, all secondary indices that store record pointers have to be updated
Das
• Node splits in B+ tree file organizations become very expensive
Objectives &
Outline • Solution: Use primary-index search key instead of record pointer in secondary index
B+-Tree Index
Files ◦ Extra traversal of primary index to locate record
+
Simple B
Index Files
Tree
– Higher cost for queries, but node splits are cheap
Nodes ◦ Add record-id if primary-index search key is non-unique
Observations
Query
Duplicates
Updates
Insertion
Deletion
File Organization
Non-Unique Keys
Relocation and
Secondary Indices
Strings

B-Tree Index
Files
Comparison

Module Summary
Database Management Systems Partha Pratim Das 43.37
Indexing Strings

Module 43

Partha Pratim • Variable length strings as keys

Das
◦ Variable fanout
Objectives &
Outline ◦ Use space utilization as criterion for splitting, not number of pointers
B+-Tree Index
Files
• Prefix compression
Simple B
Index Files
+
Tree
◦ Key values at internal nodes can be prefixes of full key
Nodes
. Keep enough characters to distinguish entries in the subtrees separated by the
Observations
Query key value
Duplicates
Updates − For example, “Silas” and “Silberschatz” can be separated by “Silb”
Insertion
Deletion ◦ Keys in leaf node can be compressed by sharing common prefixes
File Organization
Non-Unique Keys
Relocation and
Secondary Indices
Strings

B-Tree Index
Files
Comparison

Module Summary
Database Management Systems Partha Pratim Das 43.38
B-Tree Index Files PPD

Module 43

Partha Pratim
Das

Objectives &
Outline

B+-Tree Index
Files
+
Simple B Tree
Index Files
Nodes
Observations
Query
Duplicates
Updates
Insertion
Deletion
File Organization
Non-Unique Keys
Relocation and
B-Tree Index Files
Secondary Indices
Strings

B-Tree Index
Files
Comparison

Module Summary
Database Management Systems Partha Pratim Das 43.39
B-Tree Index Files

Module 43

Partha Pratim • Similar to B+ tree, but B-tree allows search-key values to appear only once; eliminates
Das
redundant storage of search keys
Objectives &
Outline • Search keys in non-leaf nodes appear nowhere else in the B-tree; an additional pointer
B+-Tree Index field for each search key in a non-leaf node must be included
Files
Simple B
+
Tree • Generalized B-tree leaf node
Index Files
Nodes
Observations
Query
Duplicates
Updates
Insertion
Deletion
File Organization
Non-Unique Keys
Relocation and
Secondary Indices
Strings • Non-leaf node - pointers Bi are the bucket or file record pointers
B-Tree Index
Files
Comparison

Module Summary
Database Management Systems Partha Pratim Das 43.40
B-Tree Index File (2): Example

Module 43

Partha Pratim
Das

Objectives &
Outline

B+-Tree Index
Files
+
Simple B Tree
Index Files
Nodes
Observations
Query
Duplicates
B-tree (above) and B+ tree (below) on same data
Updates
Insertion
Deletion
File Organization
Non-Unique Keys
Relocation and
Secondary Indices
Strings

B-Tree Index
Files
Comparison

Module Summary
Database Management Systems Partha Pratim Das 43.41
Comparison of B-Tree and B+ Tree Index Files

Module 43

Partha Pratim • Advantages of B-Tree indices:

Das
◦ May use less tree nodes than a corresponding B+ Tree
Objectives &
Outline ◦ Sometimes possible to find search-key value before reaching leaf node
B+-Tree Index
Files
• Disadvantages of B-Tree indices:
Simple B
Index Files
+
Tree
◦ Only small fraction of all search-key values are found early
Nodes ◦ Non-leaf nodes are larger, so fan-out is reduced. Thus, B-Trees typically have
Observations
Query
greater depth than corresponding B+ Tree
Duplicates
◦ Insertion and deletion more complicated than in B+ Trees
Updates
Insertion ◦ Implementation is harder than B+ Trees
Deletion
File Organization • Typically, advantages of B-Trees do not outweigh disadvantages
Non-Unique Keys
Relocation and
Secondary Indices
Strings

B-Tree Index
Files
Comparison

Module Summary
Database Management Systems Partha Pratim Das 43.42
Module Summary

Module 43

Partha Pratim • Understood the design of B+ Tree Index Files in depth for database persistent store
Das
• Familiarized with B-Tree Index Files
Objectives &
Outline

B+-Tree Index
Files
+
Simple B Tree
Index Files
Nodes
Observations
Query
Duplicates
Updates
Insertion
Deletion
File Organization
Slides used in this presentation are borrowed from [Link] with kind
Non-Unique Keys permission of the authors.
Relocation and
Secondary Indices Edited and new slides are marked with “PPD”.
Strings

B-Tree Index
Files
Comparison

Module Summary
Database Management Systems Partha Pratim Das 43.43
Module 44

Partha Pratim
Das

Objectives &
Outline Database Management Systems
Static Hashing
Hash Function
Module 44: Indexing and Hashing/4: Hashing
Example
Bucket Overflow

Dynamic Hashing
Example
Partha Pratim Das
Comparison
Schemes

Bitmap Indices Department of Computer Science and Engineering

Module Summary
Indian Institute of Technology, Kharagpur

ppd@[Link]

Database Management Systems Partha Pratim Das 44.1

Module Recap PPD

Module 44

Partha Pratim • Understood the design of B+ Tree Index Files in depth for database persistent store
Das
• Familiarized with B-Tree Index Files
Objectives &
Outline

Static Hashing
Hash Function
Example
Bucket Overflow

Dynamic Hashing
Example

Comparison
Schemes

Bitmap Indices

Module Summary

Database Management Systems Partha Pratim Das 44.2

Module Objectives PPD

Module 44

Partha Pratim • To explore various hashing schemes – Static and Dynamic Hashing
Das
• To compare Ordered Indexing and Hashing
Objectives &
Outline • To understand the Bitmap Indices
Static Hashing
Hash Function
Example
Bucket Overflow

Dynamic Hashing
Example

Comparison
Schemes

Bitmap Indices

Module Summary

Database Management Systems Partha Pratim Das 44.3

Module Outline PPD

Module 44

Partha Pratim • Static Hashing

Das
• Dynamic Hashing
Objectives &
Outline • Comparison of Ordered Indexing and Hashing
Static Hashing
Hash Function • Bitmap Indices
Example
Bucket Overflow

Dynamic Hashing
Example

Comparison
Schemes

Bitmap Indices

Module Summary

Database Management Systems Partha Pratim Das 44.4

Static Hashing PPD

Module 44

Partha Pratim
Das

Objectives &
Outline

Static Hashing
Hash Function
Example
Bucket Overflow

Dynamic Hashing
Example

Comparison
Schemes

Bitmap Indices

Module Summary
Static Hashing

Database Management Systems Partha Pratim Das 44.5

Hash Function PPD

Database Management Systems Partha Pratim Das 44.13

Example of Hash Index

Module 44

Partha Pratim
Das

Objectives &
Outline

Static Hashing
Hash Function
Example
Bucket Overflow

Dynamic Hashing
Example

Comparison
Schemes

Bitmap Indices

Module Summary

• Hash index on instructor, on attribute ID

• Computed by adding the digits modulo 8
Database Management Systems Partha Pratim Das 44.14
Deficiencies of Static Hashing

Module 44

Partha Pratim • In static hashing, function h maps search-key values to a fixed set of B of bucket
Das
addresses. Databases grow or shrink with time
Objectives &
Outline ◦ If initial number of buckets is too small, and file grows, performance will degrade
Static Hashing due to too much overflows
Hash Function
Example
◦ If space is allocated for anticipated growth, a significant amount of space will be
Bucket Overflow wasted initially (and buckets will be underfull).
Dynamic Hashing ◦ If database shrinks, again space will be wasted
Example

Comparison • One solution: periodic re-organization of the file with a new hash function
Schemes

Bitmap Indices
◦ Expensive, disrupts normal operations
Module Summary • Better solution: allow the number of buckets to be modified dynamically

Database Management Systems Partha Pratim Das 44.15

Dynamic Hashing PPD

Module 44

Partha Pratim
Das

Objectives &
Outline

Static Hashing
Hash Function
Example
Bucket Overflow

Dynamic Hashing
Example

Comparison
Schemes

Bitmap Indices

Module Summary
Dynamic Hashing

Database Management Systems Partha Pratim Das 44.16

Dynamic Hashing

Module 44

Partha Pratim • Good for database that grows and shrinks in size
Das
• Allows the hash function to be modified dynamically
Objectives &
Outline • Extendable hashing – one form of dynamic hashing
Static Hashing
Hash Function
◦ Hash function generates values over a large range — typically b-bit integers, with
Example
Bucket Overflow
b = 32
Dynamic Hashing
◦ At any time use only a prefix of the hash function to index into a table of bucket
Example addresses
Comparison
Schemes
◦ Let the length of the prefix be i bits, 0 ≤ i ≤ 32
Bitmap Indices . Bucket address table size = 2i . Initially i = 0
Module Summary . Value of i grows and shrinks as the size of the database grows and shrinks
◦ Multiple entries in the bucket address table may point to a bucket (why?)
◦ Thus, actual number of buckets is < 2i
. The number of buckets also changes dynamically due to coalescing and splitting
of buckets
Database Management Systems Partha Pratim Das 44.17
General Extendable Hash Structure PPD

Module 44

Partha Pratim
Das

Objectives &
Outline

Static Hashing
Hash Function
Example
Bucket Overflow

Dynamic Hashing
Example

Comparison
Schemes

Bitmap Indices

Module Summary

In this structure, i2 = i3 = i, whereas i1 = i − 1

Decode ij number of bits to find the record in bucket j. ij ≤ i.
Database Management Systems Partha Pratim Das 44.18
Use of Extendable Hash Structure

Module 44

Partha Pratim • Each bucket j stores a value ij

Das
◦ All the entries that point to the same bucket have the same values on the first ij bits
Objectives &
Outline • To locate the bucket containing search-key Kj
Static Hashing
Hash Function
◦ Compute h(Kj ) = X
Example ◦ Use the first i high order bits of X as a displacement into bucket address table, and
Bucket Overflow

Dynamic Hashing
follow the pointer to appropriate bucket
Example
• To insert a record with search-key value Kj
Comparison
Schemes ◦ Follow same procedure as look-up and locate the bucket, say j
Bitmap Indices ◦ If there is room in the bucket j insert record in the bucket
Module Summary ◦ Else the bucket must be split and insertion re-attempted (next slide)
. Overflow buckets used instead in some cases (will see shortly)

Database Management Systems Partha Pratim Das 44.19

Insertion in Extendable Hash Structure

Module 44
To split a bucket j when inserting record with search-key value Kj
Partha Pratim
Das
• If i > ij (more than one pointer to bucket j)
Objectives & ◦ Allocate a new bucket z, and set ij = iz = (ij + 1)
Outline

Static Hashing
◦ Update the second half of the bucket address table entries originally pointing to j,
Hash Function to point to z
Example
Bucket Overflow
◦ Remove each record in bucket j and reinsert (in j or z)
Dynamic Hashing ◦ Recompute new bucket for Kj and insert record in the bucket (further splitting is
Example
required if the bucket is still full)
Comparison
Schemes • If i = ij (only one pointer to bucket j)
Bitmap Indices ◦ If i reaches some limit b, or too many splits have happened in this insertion, create
Module Summary
an overflow bucket
◦ Else
. Increment i and double the size of the bucket address table
. Replace each entry in the table by two entries that point to the same bucket
. Recompute new bucket address table entry for Kj . Now i > ij so use the first
case
Database Management above
Systems Partha Pratim Das 44.20
Deletion in Extendable Hash Structure

Module 44

Partha Pratim • To delete a key value,

Das
◦ locate it in its bucket and remove it
Objectives &
Outline ◦ The bucket itself can be removed if it becomes empty (with appropriate updates to
Static Hashing the bucket address table)
Hash Function
Example
◦ Coalescing of buckets can be done (can coalesce only with a “buddy” bucket having
Bucket Overflow same value of ij and same ij –1 prefix, if it is present)
Dynamic Hashing ◦ Decreasing bucket address table size is also possible
Example

Comparison . Note: decreasing bucket address table size is an expensive operation and should
Schemes
be done only if number of buckets becomes much smaller than the size of the
Bitmap Indices
table
Module Summary

Database Management Systems Partha Pratim Das 44.21

Use of Extendable Hash Structure: Example

Module 44

Partha Pratim
Das

Objectives &
Outline

Static Hashing
Hash Function
Example
Bucket Overflow

Dynamic Hashing
Example

Comparison
Schemes

Bitmap Indices

Module Summary

Database Management Systems Partha Pratim Das 44.22

Example (2)

Module 44

Partha Pratim
Das

Objectives &
Outline
• Initial Hash structure; bucket size = 2
Static Hashing
Hash Function
Example
Bucket Overflow

Dynamic Hashing
Example

Comparison
Schemes

Bitmap Indices

Module Summary
• Insert “Mozart”, “Srinivasan”, and “Wu” records

Database Management Systems Partha Pratim Das 44.23

Example (3)

Module 44

Partha Pratim
• Hash structure after insertion of “Mozart”, “Srini-
Das vasan”, and “Wu” records
Objectives &
Outline

Static Hashing
Hash Function
Example
Bucket Overflow

Dynamic Hashing
Example

Comparison
Schemes

Bitmap Indices

Module Summary
• Insert Einstein record

Database Management Systems Partha Pratim Das 44.24

Example (4)

Module 44
• Hash structure after insertion of “Einstein” record
Partha Pratim
Das

Objectives &
Outline

Static Hashing
Hash Function
Example
Bucket Overflow

Dynamic Hashing
Example

Comparison
Schemes

Bitmap Indices

Module Summary
• Insert “Gold” and “El Said” records

Database Management Systems Partha Pratim Das 44.25

Example (5)

Module 44
• Hash structure after insertion of “Gold” and “El
Partha Pratim
Das
Said” records
Objectives &
Outline

Static Hashing
Hash Function
Example
Bucket Overflow

Dynamic Hashing
Example

Comparison
Schemes

Bitmap Indices

Module Summary

• Insert Katz record

Database Management Systems Partha Pratim Das 44.26

Example (6)
• Hash structure after insertion of “Katz” record
Module 44

Partha Pratim
Das

Objectives &
Outline

Static Hashing
Hash Function
Example
Bucket Overflow

Dynamic Hashing
Example

Comparison
Schemes

Bitmap Indices

Module Summary

• Insert “Singh”, “Califieri”, “Crick”, “Brandt”

Objectives &
Outline

Static Hashing
Hash Function
Example
Bucket Overflow

Dynamic Hashing
Example

Comparison
Schemes

Bitmap Indices (5): Efficient Bitmap Operations

Module 44

Partha Pratim • Bitmaps are packed into words; a single word and (a basic CPU instruction) computes
Das
and of 32 or 64 bits at once
Objectives &
Outline ◦ For example, 1-million-bit maps can be and-ed with just 31,250 instruction
Static Hashing • Counting number of 1s can be done fast by a trick:
Hash Function
Example ◦ Use each byte to index into a precomputed array of 256 elements each storing the
Bucket Overflow

Dynamic Hashing
count of 1s in the binary representation
Example . Can use pairs of bytes to speed up further at a higher memory cost
Comparison
Schemes ◦ Add up the retrieved counts
Bitmap Indices
• Bitmaps can be used instead of Tuple-ID lists at leaf levels of B+ -trees, for values that
Module Summary
have a large number of matching records
◦ Worthwhile if > 1/64 of the records have that value, assuming a tuple-id is 64 bits
◦ Above technique merges benefits of bitmap and B+ -tree indices

Database Management Systems Partha Pratim Das 44.38

Module Summary

Module 44

Partha Pratim • Explored various hashing schemes – Static and Dynamic Hashing
Das
• Compared Ordered Indexing and Hashing
Objectives &
Outline • Studied the use of Bitmap Indices for fast access of columns with limited number of
Static Hashing
Hash Function
distinct values
Example
Bucket Overflow

Dynamic Hashing
Example

Comparison
Schemes

Bitmap Indices

Module Summary

Partha Pratim
Das

Objectives &
Outline Database Management Systems
Index Definition
in SQL Module 45: Indexing and Hashing/5: Index Design
Multiple-Key Access
Privileges

Guidelines for
Indexing
Ground Rules Partha Pratim Das
Rule 0
Rule 1
Rule 2 Department of Computer Science and Engineering
Rule 3 Indian Institute of Technology, Kharagpur
Rule 4
Rule 5
ppd@[Link]
Rule 6

Module Summary

Database Management Systems Partha Pratim Das 45.1

Module Recap PPD

Index in SQL: Examples PPD

Module 45 • Create an index for a single column, to speed up queries that test that column:
Partha Pratim ◦ CREATE INDEX emp ename ON emp tab(ename);
Das
• Specify several storage settings explicitly for the index:
Objectives &
Outline ◦ CREATE INDEX emp ename ON emp tab(ename)
Index Definition
TABLESPACE users // Allocation of space in the Database to contain schema objects
in SQL STORAGE ( // Specify how Database should store a database object
Multiple-Key Access
INITIAL 20K // Specify the size of the 1st extent of the object
Privileges
NEXT 20K // Specify in bytes the size of the 2nd extent to be allocated to the object
Guidelines for
Indexing PCTINCREASE 75) // Specify the percent by which later extents grow over
Ground Rules PCTFREE 0 // 0% of each data block in this table’s data segment be free for updates
Rule 0 COMPUTE STATISTICS;
Rule 1
Rule 2
◦ Create index on two columns, to speed up queries that test either the first column or both columns:
Rule 3 . CREATE INDEX emp ename ON emp tab(ename, empno) COMPUTE STATISTICS;
Rule 4
Rule 5
◦ If a query is going to sort on the function UPPER(ENAME), an index on the ENAME column itself
Rule 6 would not speed up this operation, and it might be slow to call the function for each result row
Module Summary . A function-based index precomputes the result of the function for each column value, speeding
up queries that use the function for searching or sorting:
CREATE INDEX emp upper ename ON emp tab(UPPER(ename)) COMPUTE STATISTICS;
Source: Selecting an Index Strategy
Database Management Systems Partha Pratim Das 45.7
Index in SQL: Bitmap PPD

Module 45

Partha Pratim • create bitmap index <index-name> on <relation-name>(<attribute-list>)

Das
• Example:
Objectives &
Outline ◦ Student (Student ID, Name, Address, Age, Gender, Semester)
Index Definition
in SQL
◦ CREATE BITMAP INDEX Idx Gender ON Student (Gender);
Multiple-Key Access ◦ CREATE BITMAP INDEX Idx Semester ON Student (Semester);
Privileges

Guidelines for
Indexing
Ground Rules
Rule 0
Rule 1
Rule 2
Rule 3
Rule 4
Rule 5
Rule 6 • SELECT * FROM Student WHERE Gender = ‘F’ AND Semester =4;
Module Summary ◦ AND 0 1 1 1 with 0 0 0 1 to get the result

Database Management Systems Partha Pratim Das 45.8

Multiple-Key Access

Module 45

Partha Pratim • Use multiple indices for certain types of queries

Das
• Example:
Objectives &
Outline select ID
Index Definition from instructor
in SQL
Multiple-Key Access where dept name = “Finance” and salary = 80000
Privileges

Guidelines for
• Possible strategies for processing query using indices on single attributes:
Indexing
Ground Rules
◦ Use index on dept name to find instructors with department name Finance; test
Rule 0 salary = 80000
Rule 1
Rule 2 ◦ Use index on salary to find instructors with a salary of 80000; test dept name =
Rule 3
Rule 4
“Finance”
Rule 5 ◦ Use dept name index to find pointers to all records pertaining to the “Finance”
Rule 6
department. Similarly use index on salary. Take intersection of both sets of pointers
Module Summary
obtained

Database Management Systems Partha Pratim Das 45.9

Multiple-Key Access (2): Indices

Module 45

Partha Pratim • Composite Search Keys are search keys containing more than one attribute
Das
◦ For example, (dept name, salary )
Objectives &
Outline • Lexicographic ordering: (a1 , a2 ) < (b1 , b2 ) if either
Index Definition
in SQL ◦ a1 < b1 , or
Multiple-Key Access
Privileges
◦ a1 = b1 and a2 < b2
Guidelines for • Hence, the order is important
Indexing
Ground Rules
Rule 0
Rule 1
Rule 2
Rule 3
Rule 4
Rule 5
Rule 6

Module Summary

Database Management Systems Partha Pratim Das 45.10

Multiple-Key Access (3): Indices on Multiple Attributes

Module 45

Partha Pratim
Suppose we have an index on combined search-key:
Das (dept name, salary )
Objectives &
Outline • With the where clause
Index Definition
in SQL
where dept name = “Finance” and salary = 80000
Multiple-Key Access the index on (dept name, salary ) can be used to fetch only records that satisfy both
Privileges
conditions.
Guidelines for
Indexing ◦ Using separate indices in less efficient - we may fetch many records (or pointers)
Ground Rules
Rule 0 that satisfy only one of the conditions
Rule 1
Rule 2 ◦ Can also efficiently handle
Rule 3
Rule 4
where dept name = “Finance” and salary < 80000
Rule 5 ◦ But cannot efficiently handle
Rule 6
where dept name < “Finance” and balance = 80000
Module Summary
. May fetch many records that satisfy the first but not the second condition

Database Management Systems Partha Pratim Das 45.11

Privileges Required to Create an Index PPD

Module 45

Partha Pratim • When using indexes in an application, you might need to request that the DBA grant
Das
privileges or make changes to initialization parameters
Objectives &
Outline • To create a new index
Index Definition
in SQL
◦ You must own, or have the INDEX object privilege for the corresponding table
Multiple-Key Access ◦ The schema that contains the index must also have a quota for the tablespace
Privileges
intended to contain the index, or the UNLIMITED TABLESPACE system privilege
Guidelines for
Indexing ◦ To create an index in another user’s schema, you must have the CREATE ANY
Ground Rules
Rule 0
INDEX system privilege
Rule 1
Rule 2 • Function-based indexes also require the QUERY REWRITE privilege, and that the
Rule 3
Rule 4
QUERY REWRITE ENABLED initialization parameter to be set to TRUE
Rule 5
Rule 6

Module Summary

Database Management Systems Partha Pratim Das 45.12

Guidelines for Indexing PPD

Module 45

Partha Pratim
Das

Objectives &
Outline

Guidelines for Indexing: Rule 1 PPD

Module 45

Partha Pratim • Rule 1: Index the Correct Tables

Das
◦ Create an index if you frequently want to retrieve less than 15% of the rows in a
Objectives &
Outline large table
Index Definition
in SQL
. The percentage varies greatly according to the relative speed of a table scan and
Multiple-Key Access how clustered the row data is about the index key
Privileges

Guidelines for
− The faster the table scan, the lower the percentage
Indexing − More clustered the row data, the higher the percentage
Ground Rules
Rule 0 • Index columns used for joins to improve performance on joins of multiple tables
Rule 1
Rule 2
• Primary and unique keys automatically have indexes, but you might want to create an
Rule 3
Rule 4 index on a foreign key
Rule 5
Rule 6 • Small tables do not require indexes
Module Summary
◦ If a query is taking too long, then the table might have grown from small to large

Database Management Systems Partha Pratim Das 45.18

Guidelines for Indexing: Rule 2 PPD

Module 45 • Rule 2: Index the Correct Columns

Partha Pratim
Das
◦ Columns with the following characteristics are candidates for indexing:
. Values are relatively unique in the column
Objectives &
Outline . There is a wide range of values (good for regular indexes)
Index Definition
in SQL
. There is a small range of values (good for bitmap indexes)
Multiple-Key Access . The column contains many nulls, but queries often select all rows having a
Privileges
value. In this case, a comparison that matches all the non-null values, such as:
Guidelines for
Indexing − WHERE COL X > -9.99 *power(10, 125) is preferable to WHERE COL X
Ground Rules
Rule 0 IS NOT NULL
Rule 1
Rule 2
− This is because the first uses an index on COL X (if COL X is a numeric
Rule 3 column)
Rule 4
Rule 5 ◦ Columns with the following characteristics are less suitable for indexing:
Rule 6

Module Summary
. There are many nulls in the column and you do not search on the non-null values
. LONG and LONG RAW columns cannot be indexed
◦ The size of a single index entry cannot exceed roughly one-half (minus some
overhead) of the available space in the data block
Database Management Systems Partha Pratim Das 45.19
Guidelines for Indexing: Rule 3 PPD

Module 45

Partha Pratim • Rule 3: Limit the Number of Indexes for Each Table
Das
◦ The more indexes, the more overhead is incurred as the table is altered
Objectives &
Outline . When rows are inserted or deleted, all indexes on the table must be updated
Index Definition
in SQL
. When a column is updated, all indexes on the column must be updated
Multiple-Key Access ◦ You must weigh the performance benefit of indexes for queries against the
Privileges

Guidelines for
performance overhead of updates
Indexing
Ground Rules
. If a table is primarily read-only, you might use more indexes; but, if a table is
Rule 0 heavily updated, you might use fewer indexes
Rule 1
Rule 2
Rule 3
Rule 4
Rule 5
Rule 6

Module Summary

Database Management Systems Partha Pratim Das 45.20

Guidelines for Indexing: Rule 4 PPD

Module 45 • Rule 4: Choose the Order of Columns in Composite Indexes

Partha Pratim ◦ The order of columns in the CREATE INDEX statement can affect
Das
performance
Objectives &
Outline . Put the column used most often first in the index
Index Definition . You can create a composite index (using several columns), and
in SQL
Multiple-Key Access the same index can be used for queries that reference all of
Privileges
these columns, or just some of them
Guidelines for
Indexing • For the VENDOR PARTS table, assume that there are 5 vendors, and each vendor has
Ground Rules
Rule 0
about 1000 parts. Suppose VENDOR PARTS is commonly queried as:
Rule 1
Rule 2
◦ SELECT * FROM vendor parts WHERE part no = 457 AND vendor id = 1012;
Rule 3 ◦ Create a composite index with the most selective (with most values) column first
Rule 4
Rule 5 . CREATE INDEX ind vendor id ON vendor parts (part no, vendor id);
Rule 6

Module Summary
• Composite indexes speed up queries that use the leading portion of the index:
◦ So queries with WHERE clauses using only PART NO column also runs faster
◦ With only 5 distinct values, a separate index on VENDOR ID does not help
Database Management Systems Partha Pratim Das 45.21
Guidelines for Indexing: Rule 5 PPD

Module 45

Partha Pratim • Rule 5: Gather Statistics to Make Index Usage More Accurate
Das
◦ The database can use indexes more effectively when it has statistical information
Objectives &
Outline about the tables involved in the queries
Index Definition
in SQL
. Gather statistics when the indexes are created by including the keywords
Multiple-Key Access COMPUTE STATISTICS in the CREATE INDEX statement
Privileges
. As data is updated and the distribution of values changes, periodically refresh
Guidelines for
Indexing the statistics by calling procedures like (in Oracle):
Ground Rules
Rule 0 − DBMS [Link] TABLE STATISTICS and
Rule 1
Rule 2
− DBMS [Link] SCHEMA STATISTICS
Rule 3
Rule 4
Rule 5
Rule 6

Module Summary

Database Management Systems Partha Pratim Das 45.22

Guidelines for Indexing: Rule 6 PPD

Module 45

Partha Pratim • Rule 6: Drop Indexes That Are No Longer Required

Das
◦ You might drop an index if:
Objectives &
Outline . It does not speed up queries. The table might be very small, or there might be
Index Definition
in SQL
many rows in the table but very few index entries
Multiple-Key Access . The queries in your applications do not use the index
Privileges
. The index must be dropped before being rebuilt
Guidelines for
Indexing ◦ When you drop an index, all extents of the index’s segment are returned to the
Ground Rules
Rule 0 containing tablespace and become available for other objects in the tablespace
Rule 1
Rule 2
◦ Use the SQL command DROP INDEX to drop an index. For example, the following
Rule 3 statement drops a specific named index:
Rule 4
Rule 5 . DROP INDEX Emp ename;
Rule 6

Module Summary
◦ If you drop a table, then all associated indexes are dropped
◦ To drop an index, the index must be contained in your schema or you must have the
DROP ANY INDEX system privilege
Database Management Systems Partha Pratim Das 45.23
Module Summary

Module 45

Partha Pratim • Learnt to create Indexes in SQL

Das
• Introduced the set of Ground Rules for Indexing
Objectives &
Outline

Index Definition
in SQL
Multiple-Key Access
Privileges

Guidelines for
Indexing
Ground Rules
Rule 0
Rule 1
Rule 2
Rule 3
Slides used in this presentation are borrowed from [Link] with kind
Rule 4
Rule 5 permission of the authors.
Rule 6
Edited and new slides are marked with “PPD”.
Module Summary

Database Management Systems Partha Pratim Das 45.24

Module 46

Partha Pratim
Das

Week Recap

Objectives &
Database Management Systems
Outline
Module 46: Transactions/1
Transaction
Concept
ACID

Transaction
States
State Transition
Partha Pratim Das
Diagram

Concurrent
Executions
Department of Computer Science and Engineering
Schedules
Indian Institute of Technology, Kharagpur
Example
ppd@[Link]
Module Summary

Database Management Systems Partha Pratim Das 46.1

Week Recap PPD

Module 46

Partha Pratim • Need for indexing database tables

Das
• Understood the ordered indexes
Week Recap

Objectives &
• Recap of Balanced BST for optimal in-memory search data structures
Outline

Transaction
• Issues of external search data structures for persistent data
Concept
ACID
• Explored 2-3-4 Tree as a precursor to B/B+-Tree
Transaction
States
• Understood the B+ Tree and B Tree for Index files and data files
State Transition
Diagram
• Explored Static and Dynamic Hashing
Concurrent
Executions
• Compared Ordered Indexing and Hashing
Schedules
Example
• Studied the use of Bitmap Indices
Module Summary • Learnt to create indexes in SQL
• Learnt a set of Ground Rules for Indexing

Database Management Systems Partha Pratim Das 46.2

Module Objectives PPD

Module 46

Partha Pratim • To understand the concept of transaction – ‘doing a task in a database’ and its state
Das
• To explore issues in concurrent execution of transactions
Week Recap

Objectives &
Outline

Transaction
Concept
ACID

Transaction
States
State Transition
Diagram

Concurrent
Executions
Schedules
Example

Module Summary

Database Management Systems Partha Pratim Das 46.3

Module Outline PPD

Module 46

Partha Pratim • Transaction Concept

Das
• Transaction State
Week Recap

Objectives &
• Concurrent Executions
Outline

Transaction
Concept
ACID

Transaction
States
State Transition
Diagram

Concurrent
Executions
Schedules
Example

Module Summary

Database Management Systems Partha Pratim Das 46.4

Transaction Concept PPD

Module 46

Partha Pratim
Das

Week Recap

Objectives &
Outline

Transaction
Concept
ACID

Transaction
States
State Transition
Diagram

Concurrent
Executions
Schedules
Example Transaction Concept
Module Summary

Database Management Systems Partha Pratim Das 46.5

Transaction Concept PPD

Module 46

Partha Pratim • A transaction is a unit of program execution that accesses and, possibly updates,
Das
various data items
• For example, transaction to transfer $50 from account A to account B:
Week Recap

Objectives &
Outline 1. read(A)
Transaction
Concept
2. A := A − 50
ACID 3. write(A)
Transaction
States
4. read(B)
State Transition 5. B := B + 50
Diagram

Concurrent
6. write(B)
Executions
Schedules
• Two main issues to deal with:
Example
◦ Failures of various kinds, such as hardware failures and system crashes
Module Summary
◦ Concurrent execution of multiple transactions

Database Management Systems Partha Pratim Das 46.6

Required Properties of a Transaction: ACID: Atomicity

Module 46
• Atomicity Requirement
Partha Pratim
Das ◦ If the transaction fails after step 3 and Transaction to transfer $50 from
Week Recap
before step 6, money will be “lost” account A to account B:
Objectives & leading to an inconsistent database
Outline
state 1. read(A)
Transaction
Concept . Failure could be due to software or 2. A := A − 50
ACID
hardware 3. write(A)
Transaction
States
◦ The system should ensure that updates 4. read(B)
State Transition
Diagram
of a partially executed transaction are 5. B := B + 50
Concurrent
Executions not reflected in the database 6. write(B)
Schedules
Example

Module Summary

Database Management Systems Partha Pratim Das 46.7

Required Properties of a Transaction: ACID: Consistency

Module 46 • Consistency Requirement

Partha Pratim
Das
◦ A + B must be unchanged by the execution of the transaction
◦ In general, consistency requirements include
Week Recap
. Explicitly specified integrity constraints Transaction to transfer
Objectives &
Outline
− primary keys and foreign keys $50 from account A to
Transaction account B:
Concept . Implicit integrity constraints
ACID

Transaction
− sum of balances of all accounts, minus sum of loan 1. read(A)
States
State Transition
amounts must equal value of cash-in-hand 2. A := A – 50
Diagram
◦ A transaction, when starting to execute, must see a consistent 3. write(A)
Concurrent
Executions database 4. read(B)
Schedules
Example
◦ During transaction execution the database may be temporarily 5. B := B + 50
Module Summary inconsistent 6. write(B)
◦ When the transaction completes successfully the database
must be consistent
. Erroneous transaction logic can lead to inconsistency
Database Management Systems Partha Pratim Das 46.8
Required Properties of a Transaction: ACID: Isolation

Module 46
• Isolation Requirement
Partha Pratim
Das ◦ If between steps 3 and 6 (of the fund transfer transaction), another transaction T2
Week Recap is allowed to access the partially updated database, it will see an inconsistent
Objectives & database (the sum A + B will be less than it should be)
Outline

Transaction
T1 T2
Concept
ACID
1. read(A)
Transaction
2. A := A − 50
States
State Transition
3. write(A)
Diagram
read(A), read(B), print(A + B)
Concurrent
Executions 4. read(B)
Schedules 5. B := B + 50
Example

Module Summary
6. write(B)
◦ Isolation can be ensured trivially by running transactions serially
. That is, one after the other
◦ However, executing multiple transactions concurrently has significant benefits
Database Management Systems Partha Pratim Das 46.9
Required Properties of a Transaction: ACID: Durability

Transaction to transfer $50 from ac-

Module 46

Partha Pratim
Das
• Durability Requirement count A to account B:
Week Recap
◦ Once the user has been notified that
the transaction has completed (that 1. read(A)
Objectives &
Outline
is, the transfer of the $50 has taken 2. A := A – 50
Transaction
place), the updates to the database 3. write(A)
Concept
ACID
by the transaction must persist even if 4. read(B)
Transaction
there are software or hardware failures 5. B := B + 50
States
State Transition 6. write(B)
Diagram

Concurrent
Executions
Schedules
Example

Module Summary

Database Management Systems Partha Pratim Das 46.10

ACID Properties

Module 46 A transaction is a unit of program execution that accesses and possibly updates various data items:
Partha Pratim • Atomicity: Atomicity guarantees that each transaction is treated as a single unit, which either succeeds
Das completely, or fails completely
Week Recap ◦ If any of the statements constituting a transaction fails to complete, the entire transaction fails and
Objectives & the database is left unchanged
Outline ◦ Atomicity must be guaranteed in every situation, including power failures, errors and crashes
Transaction
Concept • Consistency: Consistency ensures that a transaction can only bring the database from one valid state to
ACID another, maintaining database invariants
Transaction
States
◦ Any data written to the database must be valid according to all defined rules, including constraints,
State Transition cascades, triggers, and any combination thereof
Diagram

Concurrent
• Isolation: Transactions are often executed concurrently (multiple transactions reading and writing to a
Executions table at the same time)
Schedules
Example ◦ Isolation ensures that concurrent execution of transactions leaves the database in the same state
Module Summary that would have been obtained if the transactions were executed sequentially
• Durability: Durability guarantees that once a transaction has been committed, it will remain committed
even in the case of a system failure (like power outage or crash)
◦ This usually means that completed transactions (or their effects) are recorded in non-volatile memory
Database Management Systems Partha Pratim Das 46.11
ACID Properties: Quick Reckoner PPD

Module 46

Partha Pratim
Das

Week Recap

Objectives &
Outline

Transaction
Concept
ACID

Transaction
States
State Transition
Diagram

Concurrent
Executions
Schedules
Example

Module Summary

Database Management Systems Partha Pratim Das 46.12

Transaction States PPD

Module 46

Partha Pratim
Das

Week Recap

Objectives &
Outline

Transaction
Concept
ACID

Transaction
States
State Transition
Diagram

Concurrent
Executions
Schedules
Example Transaction States
Module Summary

Database Management Systems Partha Pratim Das 46.13

Transaction States PPD

Module 46 • Every transaction can be in one of the following states (like Process States in OS)
Partha Pratim ◦ Active
Das
. The initial state; the transaction stays in this state while it is executing
Week Recap
◦ Partially committed
Objectives &
Outline . After the final statement has been executed
Transaction
Concept ◦ Failed
ACID
. After the discovery that normal execution can no longer proceed
Transaction
States ◦ Aborted
State Transition
Diagram . After the transaction has been rolled back and the database restored to its state
Concurrent
Executions
prior to the start of the transaction. Two options after it has been aborted:
Schedules − Restart the transaction: Can be done only if no internal logical error
Example − Kill the transaction
Module Summary
◦ Committed
. After successful completion
◦ Terminated
. After it has been committed or aborted
Database Management Systems
(killed)
Partha Pratim Das 46.14
Transitions for Transaction States PPD

Module 46

Partha Pratim
Das

Week Recap

Objectives &
Outline

Transaction
Concept
ACID

Transaction
States
State Transition
Diagram

Concurrent
Executions
Schedules
Example

Module Summary

Database Management Systems Partha Pratim Das 46.15

Concurrent Executions PPD

Module 46

Partha Pratim
Das

Week Recap

Objectives &
Outline

Transaction
Concept
ACID

Transaction
States
State Transition
Diagram

Concurrent
Executions
Schedules
Example Concurrent Executions
Module Summary

Database Management Systems Partha Pratim Das 46.16

Concurrent Executions

Module 46

Partha Pratim • Multiple transactions are allowed to run concurrently in the system. Advantages are:
Das
◦ Increased processor and disk utilization, leading to better transaction throughput
Week Recap

Objectives &
. For example, one transaction can be using the CPU while another is reading
Outline from or writing to the disk
Transaction
Concept ◦ Reduced average response time for transactions: short transactions need not
ACID
wait behind long ones
Transaction
States • Concurrency Control Schemes: Mechanisms to achieve isolation
State Transition
Diagram
◦ To control the interaction among the concurrent transactions in order to prevent
Concurrent
Executions them from destroying the consistency of the database
Schedules
Example

Module Summary

Database Management Systems Partha Pratim Das 46.17

Schedules

Module 46

Partha Pratim • Schedule: A sequence of instructions that specify the chronological order in which
Das
instructions of concurrent transactions are executed
Week Recap
◦ A schedule for a set of transactions must consist of all instructions of those
Objectives &
Outline transactions
Transaction ◦ Must preserve the order in which the instructions appear in each individual
Concept
ACID transaction
Transaction
States
• A transaction that successfully completes its execution will have a commit instructions
State Transition
Diagram
as the last statement
Concurrent ◦ By default transaction assumed to execute commit instruction as its last step
Executions
Schedules • A transaction that fails to successfully complete its execution will have an abort
Example

Module Summary
instruction as the last statement

Database Management Systems Partha Pratim Das 46.18

Schedule 1 PPD

Module 46
• Let T1 transfer $50 from A to B, and T2 transfer 10% of the balance from A to B
Partha Pratim
Das • An example of a serial schedule in which T1 is followed by T2 :
Week Recap

Objectives &
Outline

Transaction
Concept
ACID

Transaction
States
State Transition
Diagram

Concurrent
Executions
Schedules
Example

Module Summary

Database Management Systems Partha Pratim Das 46.19

Schedule 2 PPD

Module 46
• A serial schedule in which T2 is followed by T1 :
Partha Pratim
Das

Week Recap

Objectives &
Outline

Transaction
Concept
ACID

Transaction
States
State Transition
Diagram

Concurrent
Executions
Schedules
Example Values of A & B are different from
Module Summary Schedule 1 – yet consistent

Database Management Systems Partha Pratim Das 46.20

Schedule 3 PPD

Module 46
• Let T1 and T2 be the transactions defined previously. The following schedule is not a
Partha Pratim
Das
serial schedule, but it is equivalent to Schedule 1
Schedule 3 Schedule 1
Week Recap

Objectives &
Outline

Transaction
Concept
ACID

Transaction
States
State Transition
Diagram

Concurrent
Executions
Schedules
Example

Module Summary

Note – In schedules 1, 2 and 3, the sum ”A + B” is preserved

Database Management Systems Partha Pratim Das 46.21
Schedule 4 PPD

Module 46
• The following concurrent schedule does not preserve the sum of ”A + B”
Partha Pratim
Das

Week Recap

Objectives &
Outline

Transaction
Concept
ACID

Transaction
States
State Transition
Diagram

Concurrent
Executions
Schedules
Example

Module Summary

Database Management Systems Partha Pratim Das 46.22

Module Summary

Module 46

Partha Pratim • A task in a database is done as a transaction that passes through several states
Das
• Transactions are executed in concurrent fashion for better throughput
Week Recap

Objectives &
• Concurrent execution of transactions raise serializability issues that need to be addressed
Outline

Transaction
• All schedules may not satisfy ACID properties
Concept
ACID

Transaction
States
State Transition
Diagram

Concurrent
Executions
Schedules Slides used in this presentation are borrowed from [Link] with kind
Example
permission of the authors.
Module Summary
Edited and new slides are marked with “PPD”.

Database Management Systems Partha Pratim Das 46.23

Module 47

Partha Pratim
Das

Objectives &
Outline Database Management Systems
Serializability
Conflicting
Module 47: Transactions/2: Serializability
Instructions

Conflict
Serializability
Examples
Precedence Graph Partha Pratim Das
Tests

Module Summary
Department of Computer Science and Engineering
Indian Institute of Technology, Kharagpur

ppd@[Link]

Database Management Systems Partha Pratim Das 47.1

Module Recap PPD

Module 47

Partha Pratim • A task in a database is done as a transaction that passes through several states
Das
• Transactions are executed in concurrent fashion for better throughput
Objectives &
Outline • Concurrent execution of transactions raise serializability issues that need to be addressed
Serializability
Conflicting • All schedules may not satisfy ACID properties
Instructions

Conflict
Serializability
Examples
Precedence Graph
Tests

Module Summary

Database Management Systems Partha Pratim Das 47.2

Module Objectives PPD

Module 47

Partha Pratim • To understand the issues that arise when two or more transactions work concurrently
Das
• To introduce the notions of Serializability that ensure schedules for transactions that
Objectives &
Outline may run in concurrent fashion but still guarantee and serial behavior
Serializability
Conflicting
• To analyze the conditions, called conflicts, that need to be honored to attain
Instructions
Serializable schedules
Conflict
Serializability
Examples
Precedence Graph
Tests

Module Summary

Database Management Systems Partha Pratim Das 47.3

Module Outline PPD

Module 47

Partha Pratim • Serializability

Das
• Conflict Serializability
Objectives &
Outline

Serializability
Conflicting
Instructions

Conflict
Serializability
Examples
Precedence Graph
Tests

Module Summary

Database Management Systems Partha Pratim Das 47.4

Serializability PPD

Module 47

Partha Pratim
Das

Objectives &
Outline

Serializability
Conflicting
Instructions

Conflict
Serializability
Examples
Precedence Graph
Tests

Module Summary

Serializability

Database Management Systems Partha Pratim Das 47.5

Serializability

Module 47

Partha Pratim • Assumption: Each transaction preserves database consistency

Das
• Thus, serial execution of a set of transactions preserves database consistency
Objectives &
Outline • A (possibly concurrent) schedule is serializable if it is equivalent to a serial schedule
Serializability
Conflicting • Different forms of schedule equivalence give rise to the notions of:
Instructions

Conflict
a) Conflict Serializability
Serializability
Examples
b) View Serializability
Precedence Graph
Tests

Module Summary

Database Management Systems Partha Pratim Das 47.6

Reacp Schedule 3: Serializable PPD

Module 47
• Let T1 and T2 be the transactions defined previously. The following schedule is not a
Partha Pratim
Das
serial schedule, but it is equivalent to Schedule 1
Schedule 3 Schedule 1
Objectives &
Outline

Serializability
Conflicting
Instructions

Conflict
Serializability
Examples
Precedence Graph
Tests

Module Summary

Note: In schedules 1, 2 and 3, the sum ”A + B” is preserved

Database Management Systems Partha Pratim Das 47.7
Recap Schedule 4: Not Serializable PPD

Module 47
• The following concurrent schedule does not preserve the sum of ”A + B”
Partha Pratim
Das

Objectives &
Outline

Serializability
Conflicting
Instructions

Conflict
Serializability
Examples
Precedence Graph
Tests

Module Summary

Database Management Systems Partha Pratim Das 47.8

Simplified View of Transactions

Module 47

Partha Pratim • We ignore operations other than read and write instructions
Das
◦ Other operations happen in memory (are temporary in nature) and (mostly) do not
Objectives &
Outline affect the state of the database
Serializability ◦ This is a simplifying assumption for analysis
Conflicting
Instructions • We assume that transactions may perform arbitrary computations on data in local
Conflict
Serializability
buffers in between reads and writes
Examples
Precedence Graph
• Our simplified schedules consist of only read and write instructions
Tests

Module Summary

Database Management Systems Partha Pratim Das 47.9

Conflicting Instructions

Module 47

Partha Pratim • Let li and lj be two Instructions from transactions Ti and Tj respectively
Das
• Instructions li and lj conflict if and only if there exists some item Q accessed by both li
Objectives &
Outline and lj , and at least one of these instructions write to Q
Serializability a) li = read(Q), lj = read(Q). li and lj don’t conflict
Conflicting
Instructions b) li = read(Q), lj = write(Q). They conflict
Conflict
Serializability
c) li = write(Q), lj = read(Q). They conflict
Examples d) li = write(Q), lj = write(Q). They conflict
Precedence Graph
Tests • Intuitively, a conflict between li and lj forces a (logical) temporal order between them
Module Summary
◦ If li and lj are consecutive in a schedule and they do not conflict, their results would
remain the same even if they had been interchanged in the schedule

Database Management Systems Partha Pratim Das 47.10

Conflict Serializability PPD

Module 47

Partha Pratim
Das

Objectives &
Outline

Serializability
Conflicting
Instructions

Conflict
Serializability
Examples
Precedence Graph
Tests

Module Summary

Conflict Serializability

Database Management Systems Partha Pratim Das 47.11

Conflict Serializability

Module 47

Partha Pratim • If a schedule S can be transformed into a schedule S’ by a series of swaps of

Das
non-conflicting instructions, we say that S and S’ are conflict equivalent
Objectives &
Outline • We say that a schedule S is conflict serializable if it is conflict equivalent to a serial
Serializability schedule
Conflicting
Instructions

Conflict
Serializability
Examples
Precedence Graph
Tests

Module Summary

Database Management Systems Partha Pratim Das 47.12

Conflict Serializability (2) PPD

Module 47 • Schedule 3 can be transformed into Schedule 6, a serial schedule where T2 follows T1 ,
Partha Pratim by a series of swaps of non-conflicting instructions:
Das

Module Summary

Schedule 3 Schedule 5 Schedule 6

Database Management Systems Partha Pratim Das 47.13
Conflict Serializability (3)

Module 47

Partha Pratim • Example of a schedule that is not conflict serializable:

Das

Objectives &
Outline

Serializability
Conflicting
Instructions

Conflict
Serializability
Examples
• We are unable to swap instructions in the above schedule to obtain either the serial
Precedence Graph schedule < T3 , T4 >, or the serial schedule < T4 , T3 >
Tests

Module Summary

Database Management Systems Partha Pratim Das 47.14

Example: Bad Schedule PPD

Module 47

Partha Pratim
Das
Consider two transactions:
Objectives &
Outline Transaction 1 Transaction 2
Serializability UPDATE accounts UPDATE accounts
Conflicting
Instructions SET balance = balance - 100 SET balance = balance * 1.005
Conflict WHERE acct id = 31414
Serializability
Examples
Precedence Graph
Tests
Schedule S
• In terms of read / write we can write these as:
Module Summary
Transaction 1: r1 (A), w1 (A) // A is the balance for acct id = 31414
Transaction 2: r2 (A), w2 (A), r2 (B), w2 (B) // B is balance of other accounts
• Consider schedule S:
◦ Schedule S : r1 (A), r2 (A), w1 (A), w2 (A), r2 (B), w2 (B)
◦ Suppose: A starts with $200, and account B starts with $100
• Schedule S is very bad! (At least, it’s bad if you’re the bank!) We withdrew $100 from account A, but
somehow the database has recorded that our account now holds $201!
Database Management Systems Partha Pratim Das 47.15
Example: Bad Schedule (2)

Module 47

Partha Pratim
Das • Ideal schedule is serial:
Serial schedule 1:
Objectives &
Outline r1 (A), w1 (A), r2 (A), w2 (A), r2 (B), w2 (B)
Serializability
Serial schedule 2:
Conflicting r2 (A), w2 (A), r2 (B), w2 (B), r1 (A), w1 (A)
Instructions

Conflict
• We call a schedule serializable if it has the same ef-
Serializability fect as some serial schedule regardless of the specific
Examples information in the database.
Precedence Graph
Tests • As an example, consider Schedule T , which has
Module Summary
swapped the third and fourth operations from S:
◦ Schedule S : r1 (A), r2 (A), w1 (A), w2 (A), r2 (B), w2 (B)
◦ Schedule T : r1 (A), r2 (A), w2 (A), w1 (A), r2 (B), w2 (B)

Schedule T
• By first example, the outcome is the same as Serial schedule 1. But that’s just a peculiarity of the
data, as revealed by the second example, where the final value of A can’t be the consequence of either
of the possible serial schedules.
• So neither S nor T are serializable
Database Management Systems Partha Pratim Das 47.16
Example: Good Schedule PPD

Module 47

Partha Pratim • What’s a non-serial example of a serializable schedule?

Das
◦ We could credit interest to A first, then withdraw the money, then credit interest to
Objectives &
Outline B:
Serializability ◦ Schedule U : r2 (A), w2 (A), r1 (A), w1 (A), r2 (B), w2 (B)
Conflicting
Instructions . Initial: A = 200, B = 100
Conflict
Serializability
. Final: A = 101, B = 100.50
Examples
Precedence Graph
• Schedule U is conflict serializable to Schedule 2:
Tests Schedule U: r2 (A), w2 (A), r1 (A), w1 (A), r2 (B), w2 (B)
Module Summary
swap w1 (A) and r2 (B): r2 (A), w2 (A), r1 (A), r2 (B), w1 (A), w2 (B)
swap w1 (A) and w2 (B): r2 (A), w2 (A), r1 (A), r2 (B), w2 (B), w1 (A)
swap r1 (A) and r2 (B): r2 (A), w2 (A), r2 (B), r1 (A), w2 (B), w1 (A)
swap r1 (A) and w2 (B): r2 (A), w2 (A), r2 (B), w2 (B), r1 (A), w1 (A) : Schedule 2
Source: Serializability

Database Management Systems Partha Pratim Das 47.17

Serializability

Module 47
• Are all serializable schedules conflict-serializable? No.
Partha Pratim
Das • Consider the following schedule for a set of three transactions.
Objectives & ◦ w1 (A), w2 (A), w2 (B), w1 (B), w3 (B)
Outline

Serializability
• We can perform no swaps to this:
Conflicting
Instructions ◦ The first two operations are both on A and at least one is a write;
Conflict ◦ The second and third operations are by the same transaction;
Serializability
Examples
◦ The third and fourth are both on B at least one is a write; and
Precedence Graph
Tests
◦ So are the fourth and fifth.
Module Summary
◦ So this schedule is not conflict-equivalent to anything – and certainly not any serial
schedules.
• However, since nobody ever reads the values written by the w1 (A), w2 (B), and w1 (B)
operations, the schedule has the same outcome as the serial schedule:
◦ w1 (A), w1 (B), w2 (A), w2 (B), w3 (B)
Source: Serializability

Database Management Systems Partha Pratim Das 47.18

Precedence Graph

Testing for Conflict Serializability (3) PPD

Module 47
• Consider the following schedule:
Partha Pratim
Das
◦ w1 (A), r2 (A), w1 (B), w3 (C ), r2 (C ), r4 (B), w2 (D), w4 (E ), r5 (D), w5 (E )
• We start with an empty graph with five vertices labeled T1 , T2 , T3 , T4 , T5 .
Objectives &
Outline

Serializability
• We go through each operation in the schedule:
Conflicting
w1 (A): A is subsequently read by T2 , so add edge T1 → T2
Instructions
r2 (A): no subsequent writes to A, so no new edges
Conflict w1 (B): B is subsequently read by T4 , so add edge T1 → T4
Serializability
Examples
w3 (C ): C is subsequently read by T2 , so add edge T3 → T2
Precedence Graph r2 (C ): no subsequent writes to C , so no new edges
Tests r4 (B): no subsequent writes to B, so no new edges
Module Summary w2 (D): C is subsequently read by T2 , so add edge T3 → T2
w4 (E ): E is subsequently written by T5 , so add edge T4 → T5
r5 (D): no subsequent writes to D, so no new edges
w5 (E ): no subsequent operations on E , so no new edges
• We end up with precedence graph
• This graph has no cycles, so the original schedule must be serializable. Moreover, since one way to
topologically sort the graph is T3 − T1 − T4 − T2 − T5 , one serial schedule that is conflict-equivalent is
◦ w3 (C ), w1 (A), w1 (B), r4 (B), w4 (E ), r2 (A), r2 (C ), w2 (D), r5 (D), w5 (E )
Database Management Systems Partha Pratim Das 47.22
Module Summary

Module 47

Partha Pratim • Understood the issues that arise when two or more transactions work concurrently
Das
• Learnt the forms of serializability in terms of conflict and view serializability
Objectives &
Outline • Acyclic precedence graph can ensure conflict serializability
Serializability
Conflicting
Instructions

Conflict
Serializability
Examples
Precedence Graph
Tests

Module Summary

Slides used in this presentation are borrowed from [Link] with kind
permission of the authors.
Edited and new slides are marked with “PPD”.

Module Outline PPD

Module 48

Partha Pratim • Recoverability

Das
• Transaction Definition in SQL
Objectives &
Outline • View Serializability
Recovery
Example • Complex Notions of Serializability
Transactions in
SQL
TCL
COMMIT
ROLLBACK
SAVEPOINT
SET
TRANSACTION

View
Serializability
Test
Example

Complex Notions
of Serializability

Module Summary

Database Management Systems Partha Pratim Das 48.4

Recovery PPD

Module 48

Partha Pratim
Das

Objectives &
Outline

Recovery
Example

Transactions in
SQL
TCL
COMMIT
ROLLBACK
SAVEPOINT
SET
TRANSACTION

View
Serializability
Test
Recovery
Example

Complex Notions
of Serializability

Module Summary

Database Management Systems Partha Pratim Das 48.5

What is Recovery? PPD

Module 48

Partha Pratim • Serializability helps to ensure Isolation and Consistency of a schedule

Das
• Yet, the Atomicity and Consistency may be compromised in the face of system failures
Objectives &
Outline • Consider a schedule comprising a single transaction (obviously serial):
Recovery
Example
1. read(A)
Transactions in
2. A := A − 50
SQL
TCL
3. write(A)
COMMIT 4. read(B)
ROLLBACK
SAVEPOINT
5. B := B + 50
SET
TRANSACTION
6. write(B)
View 7. commit // Make the changes permanent; show the results to the user
Serializability
Test • What if system fails after Step 3 and before Step 6?
Example

Complex Notions
◦ Leads to inconsistent state
of Serializability ◦ Need to rollback update of A
Module Summary
• This is known as Recovery
Database Management Systems Partha Pratim Das 48.6
Recoverable Schedules

Module 48

Partha Pratim • If a transaction Tj reads a data item previously written by a transaction Ti , then the
Das
commit operation of Ti must appear before the commit operation of Tj .
Objectives &
Outline • The following schedule is not recoverable if T9 commits immediately after the read(A)
Recovery operation
Example

Transactions in
SQL
TCL
COMMIT
ROLLBACK
SAVEPOINT
SET
TRANSACTION

View
Serializability
Test
Example

Complex Notions
• If T8 should abort, T9 would have read (and possibly shown to the user) an inconsistent
of Serializability
database state. Hence, database must ensure that schedules are recoverable
Module Summary

Database Management Systems Partha Pratim Das 48.7

Cascading Rollbacks

Module 48

Partha Pratim • Cascading rollback: A single transaction failure leads to a series of transaction
Das
rollbacks. Consider the following schedule where none of the transactions has yet
Objectives &
Outline
committed (so the schedule is recoverable)
Recovery
Example

Transactions in
SQL
TCL
COMMIT
ROLLBACK
SAVEPOINT
SET
TRANSACTION

View
Serializability
Test
Example

Complex Notions
of Serializability
• If T10 fails, T11 and T12 must also be rolled back
Module Summary • Can lead to the undoing of a significant amount of work
Database Management Systems Partha Pratim Das 48.8
Cascadeless Schedules

Module 48

Partha Pratim • Cascadeless schedules: For each pair of transactions Ti and Tj such that Tj reads a
Das
data item previously written by Ti , the commit operation of Ti appears before the read
Objectives &
Outline
operation of Tj
Recovery • Every cascadeless schedule is also recoverable
Example

Transactions in
• It is desirable to restrict the schedules to those that are cascadeless
SQL
TCL • Example of a schedule that is NOT cascadeless
COMMIT
ROLLBACK
SAVEPOINT
SET
TRANSACTION

View
Serializability
Test
Example

Complex Notions
of Serializability

Module Summary

Database Management Systems Partha Pratim Das 48.9

Example: Irrecoverable Schedule PPD

Module 48

Partha Pratim
Das

Objectives &
Outline

Recovery
Example

Transactions in
SQL
TCL
COMMIT
ROLLBACK
SAVEPOINT
SET
TRANSACTION

View
Serializability
Test
Example

Complex Notions
of Serializability

Module Summary
Rollback is possible only till the end (commit) of T2. So the computation of A (4000) and
write in T1 is lost.
Database Management Systems Partha Pratim Das 48.10
Example: Recoverable Schedule with Cascading Rollback PPD

Module 48

Partha Pratim
Das

Objectives &
Outline

Recovery
Example

Transactions in
SQL
TCL
COMMIT
ROLLBACK
SAVEPOINT
SET
TRANSACTION

View
Serializability
Test
Example

Complex Notions
of Serializability

Module Summary
Rollback is possible as T2 has not committed yet. But T2 also need to be rolled back for
rolling back T1.
Database Management Systems Partha Pratim Das 48.11
Example: Recoverable Schedule without Cascading Rollback PPD

Module 48

Partha Pratim
Das

Objectives &
Outline

Recovery
Example

Transactions in
SQL
TCL
COMMIT
ROLLBACK
SAVEPOINT
SET
TRANSACTION

View
Serializability
Test
Example

Complex Notions Rollback is possible without cascading - wherever failure occurs.

of Serializability

Module Summary

Database Management Systems Partha Pratim Das 48.12

Transaction Definition in SQL PPD

Module 48

Partha Pratim
Das

Objectives &
Outline

Recovery
Example

Transactions in
SQL
TCL
COMMIT
ROLLBACK
SAVEPOINT
SET
TRANSACTION

View
Serializability
Test
Transaction Definition in SQL
Example

Complex Notions
of Serializability

Module Summary

Database Management Systems Partha Pratim Das 48.13

Transaction Definition in SQL

Module 48

Partha Pratim • Data manipulation language must include a construct for specifying the set of actions
Das
that comprise a transaction
Objectives &
Outline ◦ In SQL, a transaction begins implicitly
Recovery ◦ A transaction in SQL ends by:
Example

Transactions in
. Commit work
SQL
TCL
− Commits the current transaction and begins a new one
COMMIT
. Rollback work
ROLLBACK
SAVEPOINT − Causes current transaction to abort
SET
TRANSACTION
◦ In almost all database systems, by default, every SQL statement also commits
View
Serializability implicitly if it executes successfully
Test
Example
. Implicit commit can be turned off by a database directive
Complex Notions
of Serializability
− For example in JDBC, [Link](false);
Module Summary

Database Management Systems Partha Pratim Das 48.14

Transaction Control Language (TCL) PPD

Module 48
• The following commands are used to control transactions
Partha Pratim
Das ◦ COMMIT
Objectives & . To save the changes
Outline
◦ ROLLBACK
Recovery
Example . To roll back the changes
Transactions in
SQL ◦ SAVEPOINT
TCL
COMMIT
. Creates points within the groups of transactions in which to ROLLBACK
ROLLBACK
SAVEPOINT
◦ SET TRANSACTION
SET
TRANSACTION . Places a name on a transaction
View
Serializability
• Transactional control commands are only used with the DML Commands such as
Test
Example
◦ INSERT, UPDATE and DELETE only
Complex Notions
◦ They cannot be used while creating tables or dropping them because these
of Serializability
operations are automatically committed in the database
Module Summary Source: SQL - Transactions

Database Management Systems Partha Pratim Das 48.15

TCL: COMMIT Command PPD

Module 48 • COMMIT is the transactional command used to save changes invoked by a transaction to the
Partha Pratim database
Das
• COMMIT saves all the transactions to the database since the last COMMIT or ROLLBACK
Objectives &
Outline
command
Recovery • The syntax for the COMMIT command is as follows:
Example
◦ SQL> DELETE FROM Customers WHERE AGE = 25;
Transactions in
SQL ◦ SQL> COMMIT;
TCL
COMMIT
ROLLBACK
SQL> SELECT * FROM Customers; SQL> SELECT * FROM Customers;
SAVEPOINT
SET
TRANSACTION

View
Serializability
Test
Example

Complex Notions
of Serializability

Module Summary

Source: SQL - Transactions

Database Management Systems Partha Pratim Das 48.16
TCL: ROLLBACK Command PPD

Module 48 • The ROLLBACK is the command used to undo transactions that have not already been saved
Partha Pratim to the database
Das
• This can only be used to undo transactions since the last COMMIT or ROLLBACK command
Objectives &
Outline was issued
Recovery • The syntax for a ROLLBACK command is as follows:
Example

Transactions in
◦ SQL> DELETE FROM Customers WHERE AGE = 25;
SQL ◦ SQL> ROLLBACK;
TCL
COMMIT
ROLLBACK SQL> SELECT * FROM Customers; SQL> SELECT * FROM Customers;
SAVEPOINT
SET
TRANSACTION

View
Serializability
Test
Example

Complex Notions
of Serializability

Module Summary

Source: SQL - Transactions

Database Management Systems Partha Pratim Das 48.17
TCL: SAVEPOINT / ROLLBACK Command PPD

Module 48 Example:
Partha Pratim
• A SAVEPOINT is a point in a transaction when • SQL> SAVEPOINT SP1;
Das you can roll the transaction back to a certain point
◦ Savepoint created.
without rolling back the entire transaction
Objectives & • SQL> DELETE FROM Customers WHERE ID=1;
Outline • The syntax for a SAVEPOINT command is:
◦ 1 row deleted.
Recovery ◦ SAVEPOINT SAVEPOINT NAME;
Example • SQL> SAVEPOINT SP2;
• This command serves only in the creation of a
Transactions in
SQL SAVEPOINT among all the transactional state- ◦ Savepoint created.
TCL ments. • SQL> DELETE FROM Customers WHERE ID=2;
COMMIT
ROLLBACK • The ROLLBACK command is used to undo a ◦ 1 row deleted.
SAVEPOINT
SET
group of transactions • SQL> SAVEPOINT SP3;
TRANSACTION • The syntax for rolling back to a SAVEPOINT is: ◦ Savepoint created.
View
Serializability
◦ ROLLBACK TO SAVEPOINT NAME; • SQL> DELETE FROM Customers WHERE ID=3;
Test
◦ 1 row deleted.
Example

Complex Notions
of Serializability
Source: SQL - Transactions
Module Summary

Database Management Systems Partha Pratim Das 48.18

TCL: SAVEPOINT / ROLLBACK Command PPD

Module 48 • Three records deleted SQL> SAVEPOINT SP1;

Partha Pratim
SQL> DELETE FROM Customers WHERE ID=1;
Das • Undo the deletion of last two SQL> SAVEPOINT SP2;
• SQL> ROLLBACK TO SP2; SQL> DELETE FROM Customers WHERE ID=2;
Objectives &
Outline SQL> SAVEPOINT SP3;
Recovery
◦ Rollback complete SQL> DELETE FROM Customers WHERE ID=3;
Example

Transactions in
SQL SQL> SELECT * FROM Customers SQL> SELECT * FROM Customers;
TCL
COMMIT
ROLLBACK
SAVEPOINT
SET
TRANSACTION

View
Serializability
Test
Example

Complex Notions
of Serializability

Module Summary
Source: SQL - Transactions

Database Management Systems Partha Pratim Das 48.19

TCL: RELEASE SAVEPOINT Command PPD

Module 48 • The RELEASE SAVEPOINT command is used to remove a SAVEPOINT that you have
Partha Pratim created
Das
• The syntax for a RELEASE SAVEPOINT command is as follows
Objectives &
Outline ◦ RELEASE SAVEPOINT SAVEPOINT NAME;
Recovery
Example • Once a SAVEPOINT has been released, you can no longer use the ROLLBACK
Transactions in command to undo transactions performed since the last SAVEPOINT
SQL
TCL
COMMIT
ROLLBACK
SAVEPOINT
SET
TRANSACTION

View
Serializability
Test Source: SQL - Transactions
Example

Complex Notions
of Serializability

Module Summary

Database Management Systems Partha Pratim Das 48.20

TCL: SET TRANSACTION Command PPD

Module 48

Partha Pratim • The SET TRANSACTION command can be used to initiate a database transaction
Das
• This command is used to specify characteristics for the transaction that follows
Objectives &
Outline ◦ For example, you can specify a transaction to be read only or read write
Recovery
Example
• The syntax for a SET TRANSACTION command is as follows:
Transactions in
SQL
◦ SET TRANSACTION [ READ WRITE | READ ONLY ];
TCL
COMMIT
ROLLBACK
SAVEPOINT
SET
TRANSACTION

View
Serializability
Test Source: SQL - Transactions
Example

Complex Notions
of Serializability

Module Summary

Database Management Systems Partha Pratim Das 48.21

View Serializability PPD

Module 48

Partha Pratim
Das

Objectives &
Outline

Recovery
Example

Transactions in
SQL
TCL
COMMIT
ROLLBACK
SAVEPOINT
SET
TRANSACTION

View
Serializability
Test
View Serializability
Example

Complex Notions
of Serializability

Module Summary

Database Management Systems Partha Pratim Das 48.22

View Serializability

Module 48

Partha Pratim • Let S and S 0 be two schedules with the same set of transactions. S and S 0 are view
Das
equivalent if the following three conditions are met, for each data item Q,
Objectives &
Outline ◦ Initial Read: If in schedule S, transaction Ti reads the initial value of Q, then in
Recovery schedule S 0 also transaction Ti must read the initial value of Q
Example
◦ Write-Read Pair: If in schedule S transaction Ti executes read(Q), and that value
was produced by transaction Tj (if any), then in schedule S 0 also transaction Ti
Transactions in
SQL
TCL
COMMIT
must read the value of Q that was produced by the same write(Q) operation of
ROLLBACK transaction Tj
SAVEPOINT
SET ◦ Final Write: The transaction (if any) that performs the final write(Q) operation in
schedule S must also perform the final write(Q) operation in schedule S 0
TRANSACTION

View
Serializability
Test
• As can be seen, view equivalence is also based purely on reads and writes alone
Example

Complex Notions
of Serializability

Module Summary

Database Management Systems Partha Pratim Das 48.23

View Serializability (2)

Module 48 • A schedule S is view serializable if it is view equivalent to a serial schedule

Partha Pratim
Das
• Every conflict serializable schedule is also view serializable
Objectives &
• Below is a schedule which is view-serializable but not conflict serializable
Outline

Recovery
Example

Transactions in
SQL
TCL
COMMIT
ROLLBACK
SAVEPOINT
SET
• What serial schedule is above equivalent to?
TRANSACTION

View
◦ T27 − T28 − T29
Serializability ◦ The one read(Q) instruction reads the initial value of Q in both schedules and
Test
Example ◦ T29 performs the final write of Q in both schedules
Complex Notions
of Serializability
• T28 and T29 perform write(Q) operations called blind writes, without having
Module Summary performed a read(Q) operation
• Every view serializable schedule that is not conflict serializable has blind writes
Database Management Systems Partha Pratim Das 48.24
Test for View Serializability

Module 48

Partha Pratim • The precedence graph test for conflict serializability cannot be used directly to test for
Das
view serializability
Objectives &
Outline ◦ Extension to test for view serializability has cost exponential in the size of the
Recovery precedence graph
Example

Transactions in
• The problem of checking if a schedule is view serializable falls in the class of
SQL NP-complete problems
TCL
COMMIT ◦ Thus, existence of an efficient algorithm is extremely unlikely
ROLLBACK
SAVEPOINT • However, practical algorithms that just check some sufficient conditions for view
SET
TRANSACTION serializability can still be used
View
Serializability
Test
Example

Complex Notions
of Serializability

Module Summary

Database Management Systems Partha Pratim Das 48.25

View Serializability: Example 1 PPD

Module 48
• Check whether the schedule is view serializable or not?
Partha Pratim
Das ◦ S : R2(B); R2(A); R1(A); R3(A); W 1(B); W 2(B); W 3(B);
Objectives &
Outline
• Solution:
Recovery ◦ With 3 transactions, total number of schedules possible = 3! = 6
Example
. < T1 T2 T3 >
Transactions in
SQL . < T1 T3 T2 >
TCL
COMMIT
. < T2 T3 T1 >
ROLLBACK
. < T2 T1 T3 >
SAVEPOINT
SET . < T3 T1 T2 >
TRANSACTION

View
. < T3 T2 T1 >
Serializability
Test
Example

Complex Notions
of Serializability Source: http: // www. edugrabs. com/ how- to- check- for- view- serializable- schedule/ (Accessed 12-Feb-18)
Module Summary

Database Management Systems Partha Pratim Das 48.26

View Serializability: Example 1 (2) PPD

Module 48
• Check whether the schedule is view serializable or not?
Partha Pratim
Das ◦ S : R2(B); R2(A); R1(A); R3(A); W 1(B); W 2(B); W 3(B);
Objectives & • Solution:
Outline

Recovery
◦ Final update on data items:
Example
. A : − (No write on A)
Transactions in
SQL . B : T1 , T2 , T3 (All 3 transactions write B)
TCL . As the final update on B is made by T3 , (T1 , T2 ) → T3 . Now, Removing those
COMMIT
ROLLBACK schedules in which T3 is not executing at last:
SAVEPOINT
SET − < T1 T2 T3 >
TRANSACTION

View
− < T2 T1 T3 >
Serializability
Test
Example

Complex Notions
of Serializability Source: http: // www. edugrabs. com/ how- to- check- for- view- serializable- schedule/ (Accessed 12-Feb-18)
Module Summary

Database Management Systems Partha Pratim Das 48.27

View Serializability: Example 1 (3) PPD

Module 48
• Check whether the schedule is view serializable or not?
Partha Pratim
Das ◦ S : R2(B); R2(A); R1(A); R3(A); W 1(B); W 2(B); W 3(B);
Objectives & • Solution:
Outline

Recovery
◦ Initial Read + Which transaction updates after read?
Example
. A : T2 , T1 , T3 (initial read)
Transactions in
SQL . B : T2 (initial read); T1 (update after read)
TCL . The transaction T2 reads B initially which is updated by T1 . So T2 must
COMMIT
ROLLBACK execute before T1 . Hence, T2 → T1 . So only one schedule survives:
SAVEPOINT
SET
. < T2 T1 T3 >
TRANSACTION

View
◦ Write Read Sequence (WR)
Serializability
Test
. No need to check here
Example
◦ Hence, view equivalent serial schedule is:
Complex Notions
of Serializability . T2 → T1 → T3
Module Summary

Source: http: // www. edugrabs. com/ how- to- check- for- view- serializable- schedule/ (Accessed 12-Feb-18)
Database Management Systems Partha Pratim Das 48.28
View Serializability: Example 2 PPD

Module 48

Partha Pratim • Check whether S is Conflict serializable and / or view serializable or not?
Das
◦ S : R1(A); R2(A); R3(A); R4(A); W 1(B); W 2(B); W 3(B); W 4(B)
Objectives &
Outline • Solution is given in the next slide (hidden). First try to solve this and then check the
Recovery solution.
Example

Transactions in
SQL
TCL
COMMIT
ROLLBACK
SAVEPOINT
SET
TRANSACTION

View
Serializability Source: Given in solution slides
Test
Example

Complex Notions
of Serializability

Module Summary

Database Management Systems Partha Pratim Das 48.29

Complex Notions of Serializability PPD

Module 48

Partha Pratim
Das

Objectives &
Outline

Recovery
Example

Transactions in
SQL
TCL
COMMIT
ROLLBACK
SAVEPOINT
SET
TRANSACTION

View
Serializability
Test
Complex Notions of Serializability
Example

Complex Notions
of Serializability

Module Summary

Database Management Systems Partha Pratim Das 48.30

More Complex Notions of Serializability

Module 48
• The schedule below produces the same outcome as the serial schedule < T 1, T 5 >, yet
Partha Pratim
Das
is not conflict equivalent or view equivalent to it
Objectives &
Outline

Recovery
Example

Transactions in
SQL
TCL
COMMIT
ROLLBACK
SAVEPOINT
SET
TRANSACTION

View
Serializability
Test
Example

Complex Notions
of Serializability
• If we start with A = 1000 and B = 2000, the final result is 960 and 2040
Module Summary
• Determining such equivalence requires analysis of operations other than read and write
Database Management Systems Partha Pratim Das 48.31
Module Summary

Module 48

Partha Pratim • With proper planning, a database can be recovered back to a consistent state from
Das
inconsistent state in the face of system failures. Such a recovery is done via cascaded or
Objectives &
Outline
cascadeless rollback
Recovery • View Serializability is a weaker serializability system for better concurrency. However,
Example
testing for view serializability is NP complete
Transactions in
SQL
TCL
COMMIT
ROLLBACK
SAVEPOINT
SET
TRANSACTION

View
Serializability Slides used in this presentation are borrowed from [Link] with kind
Test
Example permission of the authors.
Complex Notions Edited and new slides are marked with “PPD”.
of Serializability

Module Summary

Database Management Systems Partha Pratim Das 48.32

Module 49

Partha Pratim
Das

Objectives &
Outline Database Management Systems
Concurrency
Control Module 49: Concurrency Control/1
Lock-Based
Protocols
Example
Lock-Based
Protocols
Two-Phase Locking
Partha Pratim Das
Protocol
Lock Conversions
Automatic
Department of Computer Science and Engineering
Acquisition of Locks Indian Institute of Technology, Kharagpur
Deadlocks
Starvation
ppd@[Link]
Cascading
More Protocols

Implementation
of Locking
Lock Table

Module Summary
Database Management Systems Partha Pratim Das 49.1
Module Recap PPD

Module 49

Partha Pratim • With proper planning, a database can be recovered back to a consistent state from
Das
inconsistent state in the face of system failures. Such a recovery is done via cascaded or
Objectives &
Outline
cascadeless rollback
Concurrency • View Serializability is a weaker serializability system for better concurrency. However,
Control

Lock-Based
testing for view serializability is NP complete
Protocols
Example
Lock-Based
Protocols
Two-Phase Locking
Protocol
Lock Conversions
Automatic
Acquisition of Locks
Deadlocks
Starvation
Cascading
More Protocols

Implementation
of Locking
Lock Table

Module Summary
Database Management Systems Partha Pratim Das 49.2
Module Objectives PPD

Module 49

Partha Pratim • Concurrency Control through design of serializable schedule is difficult in general.
Das
Hence we take a look into locking mechanism and Lock-Based Protocols
Objectives &
Outline • We need to understand how locks may be implemented
Concurrency
Control

Lock-Based
Protocols
Example
Lock-Based
Protocols
Two-Phase Locking
Protocol
Lock Conversions
Automatic
Acquisition of Locks
Deadlocks
Starvation
Cascading
More Protocols

Implementation
of Locking
Lock Table

Module Summary
Database Management Systems Partha Pratim Das 49.3
Module Outline PPD

Module 49

Partha Pratim • Concurrency Control

Das
• Lock-Based Protocols
Objectives &
Outline • Implementing Locking
Concurrency
Control

Lock-Based
Protocols
Example
Lock-Based
Protocols
Two-Phase Locking
Protocol
Lock Conversions
Automatic
Acquisition of Locks
Deadlocks
Starvation
Cascading
More Protocols

Implementation
of Locking
Lock Table

Module Summary
Database Management Systems Partha Pratim Das 49.4
Concurrency Control PPD

Module 49

Partha Pratim
Das

Objectives &
Outline

Concurrency
Control

Lock-Based
Protocols
Example
Lock-Based
Protocols
Two-Phase Locking
Protocol
Lock Conversions
Automatic
Acquisition of Locks
Deadlocks
Starvation
Concurrency Control
Cascading
More Protocols

Implementation
of Locking
Lock Table

Module Summary
Database Management Systems Partha Pratim Das 49.5
Concurrency Control

Module 49

Partha Pratim • A database must provide a mechanism that will ensure that all possible schedules are
Das
both:
Objectives &
Outline ◦ Conflict serializable
Concurrency ◦ Recoverable and, preferably, Cascadeless
Control

Lock-Based
• A policy in which only one transaction can execute at a time generates serial schedules,
Protocols
Example
but provides a poor degree of concurrency
Lock-Based
Protocols • Concurrency-control schemes tradeoff between the amount of concurrency they allow
Two-Phase Locking
Protocol and the amount of overhead that they incur
Lock Conversions
Automatic • Testing a schedule for serializability after it has executed is a little too late!
Acquisition of Locks
Deadlocks ◦ Tests for serializability help us understand why a concurrency control protocol is
Starvation
Cascading
correct
More Protocols
• Goal: To develop concurrency control protocols that will assure serializability
Implementation
of Locking
Lock Table

Module Summary
Database Management Systems Partha Pratim Das 49.6
Concurrency Control (2) PPD

Module 49

Partha Pratim • One way to ensure isolation is to require that data items be accessed in a mutually
Das
exclusive manner; that is, while one transaction is accessing a data item, no other
Objectives &
Outline
transaction can modify that data item
Concurrency ◦ Should a transaction hold a lock on the whole database
Control

Lock-Based
. Would lead to strictly serial schedules – very poor performance
Protocols
Example
• The most common method used to implement locking requirement is to allow a
Lock-Based
Protocols
transaction to access a data item only if it is currently holding a lock on that item
Two-Phase Locking
Protocol
Lock Conversions
Automatic
Acquisition of Locks
Deadlocks
Starvation
Cascading
More Protocols

Implementation
of Locking
Lock Table

Module Summary
Database Management Systems Partha Pratim Das 49.7
Lock-Based Protocols PPD

Module 49

Partha Pratim
Das

Objectives &
Outline

Concurrency
Control

Lock-Based
Protocols
Example
Lock-Based
Protocols
Two-Phase Locking
Protocol
Lock Conversions
Automatic
Acquisition of Locks
Deadlocks
Starvation
Lock-Based Protocols
Cascading
More Protocols

Implementation
of Locking
Lock Table

Module Summary
Database Management Systems Partha Pratim Das 49.8
Lock-Based Protocols

Module 49

Partha Pratim • A lock is a mechanism to control concurrent access to a data item

Das
• Data items can be locked in two modes:
Objectives &
Outline a) exclusive (X ) mode:
Concurrency
Control ◦ Data item can be both read as well as written
Lock-Based ◦ X-lock is requested using lock-X instruction
Protocols
Example b) shared (S) mode:
Lock-Based
Protocols ◦ Data item can only be read
Two-Phase Locking
Protocol ◦ S-lock is requested using lock-S instruction
Lock Conversions
Automatic
Acquisition of Locks
• A transaction can unlock a data item Q by the unlock(Q) Instruction
Deadlocks
Starvation
• Lock requests are made to the concurrency-control manager by the programmer
Cascading
More Protocols
• Transaction can proceed only after request is granted
Implementation
of Locking
Lock Table

Module Summary
Database Management Systems Partha Pratim Das 49.9
Lock-Based Protocols (2): Lock Compatibility Matrix

Module 49 • Lock-Compatibility Matrix: A lock compatibility matrix is used which states whether
Partha Pratim a data item can be locked by two transactions at the same time
Das
• Full compatibility matrix
Objectives &
Outline

Concurrency
Control

Lock-Based
Protocols
Example
Lock-Based
Protocols
Two-Phase Locking
Protocol
Lock Conversions
Automatic
Acquisition of Locks • Abbreviated compatibility matrix
Deadlocks
Starvation
Cascading
More Protocols

Implementation
of Locking
Lock Table

Module Summary
Database Management Systems Partha Pratim Das 49.10
Lock-Based Protocols (3)

Module 49 • Requesting for / Granting of a Lock

Partha Pratim ◦ A transaction may be granted a lock on an item if the requested lock is compatible with locks
Das already held on the item by other transactions
Objectives & • Sharing a Lock
Outline

Concurrency
◦ Any number of transactions can hold shared locks on an item
Control ◦ But if any transaction holds an exclusive lock on the item no other transaction may hold any lock
Lock-Based on the item
Protocols
Example • Waiting for a Lock
Lock-Based
Protocols ◦ If a lock cannot be granted, the requesting transaction is made to wait till all incompatible locks
Two-Phase Locking
Protocol
held by other transactions have been released
Lock Conversions
Automatic
• Holding a Lock
Acquisition of Locks
Deadlocks
◦ A transaction must hold a lock on a data item as long as it accesses that item
Starvation
Cascading
• Unlocking / Releasing a Lock
More Protocols
◦ Transaction Ti may unlock a data item that it had locked at some earlier point
Implementation
of Locking
◦ It is not necessarily desirable for a transaction to unlock a data item immediately after its final
Lock Table
access of that data item, since serializability may not be ensured
Module Summary
Database Management Systems Partha Pratim Das 49.11
Lock-Based Protocols: Example: Serial Schedule

Module 49 • Let A and B be two accounts that are accessed by

Partha Pratim transactions T1 and T2 .
Das
◦ Transaction T1 transfers $50 from account B to
Objectives &
Outline
account A
Concurrency ◦ Transaction T2 displays the total amount of
Control
money in accounts A and B, that is, the sum
Lock-Based
Protocols A+B
Example
Lock-Based • Suppose that the values of accounts A and B are
Protocols
Two-Phase Locking $100 and $200, respectively
Protocol
Lock Conversions
• If these transactions are executed serially, either as
Automatic
Acquisition of Locks T1 , T2 or the order T2 , T1 then transaction T2 will
display the value $300
Deadlocks
Starvation
Cascading
More Protocols

Implementation
of Locking
Lock Table

Module Summary
Database Management Systems Partha Pratim Das 49.12
Lock-Based Protocols: Example (2): Concurrent Schedule: Bad

Module 49
• If, however, these transactions are executed concur-
Partha Pratim
rently, then schedule 1 is possible
Das
• In this case, transaction T2 displays $250, which is
Objectives &
Outline
incorrect. The reason for this mistake is that
Concurrency ◦ the transaction T1 unlocked data item B too early,
Control
as a result of which T2 saw an inconsistent state
Lock-Based
Protocols
Example
• Suppose we delay unlocking till the end
Lock-Based
Protocols
Two-Phase Locking
Protocol
Lock Conversions
Automatic
Acquisition of Locks
Deadlocks
Starvation
Cascading
More Protocols

Implementation
of Locking
Lock Table
Schedule 1
Module Summary
Database Management Systems Partha Pratim Das 49.13
Lock-Based Protocols: Example (3): Concurrent Schedule: Good

Module 49
• Delaying unlocking till the end, T1 becomes T3 &
Partha Pratim
T2 becomes T4
Das

Objectives &
Outline

Concurrency
Control

Lock-Based
Protocols
Example
Lock-Based
Protocols
Two-Phase Locking
Protocol
Lock Conversions
Automatic
Acquisition of Locks
Deadlocks
Starvation • Hence, sequence of reads and writes as in Schedule
Cascading
More Protocols
1 is no longer possible
Implementation • T4 will correctly display $300 Schedule 1
of Locking
Lock Table

Module Summary
Database Management Systems Partha Pratim Das 49.14
Lock-Based Protocols: Example (4): Concurrent Schedule:
Deadlock
Module 49
• Given, T3 and T4 , consider Schedule 2 (partial)
Partha Pratim
Das • Since T3 is holding an exclusive mode lock on B and T4 is
Objectives &
requesting a shared-mode lock on B, T4 is waiting for T3 to
Outline unlock B
Concurrency
Control • Similarly, since T4 is holding a shared-mode lock on A and
Lock-Based T3 is requesting an exclusive-mode lock on A, T3 is waiting
Protocols
Example
for T4 to unlock A
Lock-Based
Protocols
• Thus, we have arrived at a state where neither of these trans-
Two-Phase Locking
Protocol
actions can ever proceed with its normal execution
Lock Conversions
Automatic
• This situation is called deadlock
Acquisition of Locks
Deadlocks
• When deadlock occurs, the system must roll back one of the
Starvation two transactions.
Cascading
More Protocols • Once a transaction has been rolled back, the data items that
Implementation were locked by that transaction are unlocked.
of Locking
Lock Table • These data items are then available to the other transaction, Schedule 2
Module Summary which can continue with its execution.
Database Management Systems Partha Pratim Das 49.15
Lock-Based Protocols

Module 49

Partha Pratim • If we do not use locking, or if we unlock data items too soon after reading or writing them, we
Das
may get inconsistent states
Objectives &
Outline
• On the other hand, if we do not unlock a data item before requesting a lock on another data
Concurrency
item, deadlocks may occur
Control
• Deadlocks are a necessary evil associated with locking, if we want to avoid inconsistent states
Lock-Based
Protocols • Deadlocks are definitely preferable to inconsistent states, since they can be handled by rolling
Example
Lock-Based
back transactions, whereas inconsistent states may lead to real-world problems that cannot be
Protocols
Two-Phase Locking
handled by the database system
Protocol
Lock Conversions
• A locking protocol is a set of rules followed by all transactions while requesting and releasing
Automatic
Acquisition of Locks
locks
Deadlocks
Starvation
• Locking protocols restrict the set of possible schedules
Cascading
More Protocols
• The set of all such schedules is a proper subset of all possible serializable schedules
Implementation • We present locking protocols that allow only conflict-serializable schedules, and thereby ensure
of Locking
Lock Table
isolation
Module Summary
Database Management Systems Partha Pratim Das 49.16
Two-Phase Locking Protocol

Module 49

Partha Pratim • This protocol ensures conflict-serializable schedules

Das
• Phase 1: Growing Phase
Objectives &
Outline ◦ Transaction may obtain locks
Concurrency
Control
◦ Transaction may not release locks
Lock-Based • Phase 2: Shrinking Phase
Protocols
Example ◦ Transaction may release locks
Lock-Based
Protocols ◦ Transaction may not obtain locks
Two-Phase Locking
Protocol
Lock Conversions
• The protocol assures serializability. It can be proved that the transactions can be
Automatic
Acquisition of Locks
serialized in the order of their lock points
Deadlocks
◦ That is, the point where a transaction acquired its final lock
Starvation
Cascading
More Protocols

Implementation
of Locking
Lock Table

Module Summary
Database Management Systems Partha Pratim Das 49.17
Two-Phase Locking Protocol (2)

Module 49

Partha Pratim • There can be conflict serializable schedules that cannot be obtained if two-phase
Das
locking is used
Objectives &
Outline • However, in the absence of extra information (that is, ordering of access to data),
Concurrency two-phase locking is needed for conflict serializability in the following sense:
Control

Lock-Based
◦ Given a transaction Ti that does not follow two-phase locking, we can find a
Protocols
Example
transaction Tj that uses two-phase locking, and a schedule for Ti and Tj that is not
Lock-Based
Protocols
conflict serializable
Two-Phase Locking
Protocol
Lock Conversions
Automatic
Acquisition of Locks
Deadlocks
Starvation
Cascading
More Protocols

Implementation
of Locking
Lock Table

Module Summary
Database Management Systems Partha Pratim Das 49.18
Lock Conversions

Module 49

Partha Pratim • Two-phase locking with lock conversions:

Das
– First Phase:
Objectives &
Outline . can acquire a lock-S on item
Concurrency
Control
. can acquire a lock-X on item
Lock-Based
. can convert a lock-S to a lock-X (upgrade)
Protocols
Example
– Second Phase:
Lock-Based
Protocols . can release a lock-S
Two-Phase Locking
Protocol . can release a lock-X
Lock Conversions
Automatic
. can convert a lock-X to a lock-S (downgrade)
Acquisition of Locks
Deadlocks • This protocol assures serializability. But still relies on the programmer to insert the
Starvation
Cascading
various locking instructions
More Protocols

Implementation
of Locking
Lock Table

Module Summary
Database Management Systems Partha Pratim Das 49.19
Automatic Acquisition of Locks: Read

Module 49

Partha Pratim • A transaction Ti issues the standard read/write instruction, without explicit locking calls
Das
• The operation read(D) is processed as:
Objectives &
Outline if Ti has a lock on D
Concurrency then
Control

Lock-Based
read(D)
Protocols else begin
Example
Lock-Based if necessary, wait until no other transaction has a lock-X on D
Protocols
Two-Phase Locking grant Ti a lock-S on D;
Protocol
Lock Conversions
read(D)
Automatic
Acquisition of Locks
end
Deadlocks
Starvation
Cascading
More Protocols

Implementation
of Locking
Lock Table

Module Summary
Database Management Systems Partha Pratim Das 49.20
Automatic Acquisition of Locks: Write

Module 49

Partha Pratim • write(D) is processed as:

Das
if Ti has a lock-X on D
Objectives &
Outline
then
Concurrency
write(D)
Control
else begin
Lock-Based
Protocols if necessary, wait until no other transaction has any lock on D,
Example
Lock-Based
if Ti has a lock-S on D
Protocols
Two-Phase Locking
then
Protocol
upgrade lock on D to lock-X
Lock Conversions
Automatic else
Acquisition of Locks
Deadlocks grant Ti a lock-X on D
Starvation
Cascading
write(D)
More Protocols end;
Implementation
of Locking • All locks are released after commit or abort
Lock Table

Module Summary
Database Management Systems Partha Pratim Das 49.21
Deadlocks

Module 49
• Two-phase locking does not ensure freedom from
Partha Pratim
Das deadlocks
Objectives &
Outline

Concurrency
Control

Lock-Based
Protocols
Example
Lock-Based
Protocols
Two-Phase Locking
Protocol
Lock Conversions
Automatic
Acquisition of Locks
Deadlocks
Starvation
Cascading
• Observe that transactions T3 and T4 are two phase,
More Protocols but, in deadlock
Implementation
of Locking
Lock Table

Module Summary
Database Management Systems Partha Pratim Das 49.22
Starvation

Module 49

Partha Pratim • In addition to deadlocks, there is a possibility of Starvation

Das
• Starvation occurs if the concurrency control manager is badly designed. For example:
Objectives &
Outline ◦ A transaction may be waiting for an X-lock on an item, while a sequence of other
Concurrency
Control
transactions request and are granted an S-lock on the same item
Lock-Based
◦ The same transaction is repeatedly rolled back due to deadlocks
Protocols
Example • Concurrency control manager can be designed to prevent starvation
Lock-Based
Protocols • Starvation is also loosely referred to as Livelock
Two-Phase Locking
Protocol
Lock Conversions
Automatic
Acquisition of Locks
Deadlocks
Starvation
Cascading
More Protocols

Implementation
of Locking
Lock Table

Module Summary
Database Management Systems Partha Pratim Das 49.23
Cascading Rollback

Module 49
• The potential for deadlock exists in most
Partha Pratim
Das locking protocols. Deadlocks are a neces-
Objectives &
sary evil
Outline
• When a deadlock occurs there is a possi-
Concurrency
Control bility of cascading roll-backs
Lock-Based
Protocols
• Cascading roll-back is possible under two-
Example phase locking
Lock-Based
Protocols
Two-Phase Locking
• In the schedule here, each transaction ob-
Protocol
Lock Conversions
serves the two-phase locking protocol, but
Automatic
Acquisition of Locks
the failure of T5 after the read(A) step of
Deadlocks T7 leads to cascading rollback of T6 and
Starvation
Cascading T7.
More Protocols

Implementation
of Locking
Lock Table

Module Summary
Database Management Systems Partha Pratim Das 49.24
More Two Phase Locking Protocols

Module 49

Partha Pratim • To avoid Cascading roll-back, follow a modified protocol called strict two-phase
Das
locking
Objectives &
Outline ◦ a transaction must hold all its exclusive locks till it commits/aborts
Concurrency
Control
• Rigorous two-phase locking is even stricter
Lock-Based ◦ All locks are held till commit/abort. In this protocol transactions can be serialized
Protocols
Example
in the order in which they commit
Lock-Based
Protocols • Note that concurrency goes down as we move to more and more strict locking protocol
Two-Phase Locking
Protocol
Lock Conversions
Automatic
Acquisition of Locks
Deadlocks
Starvation
Cascading
More Protocols

Implementation
of Locking
Lock Table

Module Summary
Database Management Systems Partha Pratim Das 49.25
Implementation of Locking PPD

Module 49

Partha Pratim
Das

Objectives &
Outline

Concurrency
Control

Lock-Based
Protocols
Example
Lock-Based
Protocols
Two-Phase Locking
Protocol
Lock Conversions
Automatic
Acquisition of Locks
Deadlocks
Starvation
Implementation of Locking
Cascading
More Protocols

Implementation
of Locking
Lock Table

Module Summary
Database Management Systems Partha Pratim Das 49.26
Implementation of Locking

Module 49

Partha Pratim • A lock manager can be implemented as a separate process to which transactions send
Das
lock and unlock requests
Objectives &
Outline • The lock manager replies to a lock request by sending a lock grant messages (or a
Concurrency message asking the transaction to roll back, in case of a deadlock)
Control

Lock-Based • The requesting transaction waits until its request is answered

Protocols
Example • The lock manager maintains a data-structure called a lock table to record granted
Lock-Based
Protocols locks and pending requests
Two-Phase Locking
Protocol
• The lock table is usually implemented as an in-memory hash table indexed on the name
Lock Conversions
Automatic of the data item being locked
Acquisition of Locks
Deadlocks
Starvation
Cascading
More Protocols

Implementation
of Locking
Lock Table

Module Summary
Database Management Systems Partha Pratim Das 49.27
Lock Table

Module 49
• Dark blue rectangles indicate granted locks; light
Partha Pratim
Das blue indicate waiting requests
Objectives & • Lock table also records the type of lock granted or
Outline
requested
Concurrency
Control • New request is added to the end of the queue of
Lock-Based
Protocols requests for the data item, and granted if it is com-
Example patible with all earlier locks
Lock-Based
Protocols
Two-Phase Locking
• Unlock requests result in the request being deleted,
Protocol
Lock Conversions
and later requests are checked to see if they can
Automatic
Acquisition of Locks
now be granted
Deadlocks
Starvation
• If transaction aborts, all waiting or granted requests
Cascading of the transaction are deleted
More Protocols

Implementation ◦ lock manager may keep a list of locks held by

of Locking
Lock Table
each transaction, to implement this efficiently
Module Summary
Database Management Systems Partha Pratim Das 49.28
Module Summary

Module 49

Partha Pratim • Understood the locking mechanism and protocols

Das
• Realized that deadlock is a peril of locking and needs to be handled through rollback
Objectives &
Outline

Concurrency
Control

Lock-Based
Protocols
Example
Lock-Based
Protocols
Two-Phase Locking
Protocol
Lock Conversions
Automatic
Acquisition of Locks Slides used in this presentation are borrowed from [Link] with kind
Deadlocks
Starvation permission of the authors.
Cascading
More Protocols
Edited and new slides are marked with “PPD”.
Implementation
of Locking
Lock Table

Module Summary
Database Management Systems Partha Pratim Das 49.29
Module 50

Partha Pratim
Das

Objectives &
Outline Database Management Systems
Deadlock
Handling Module 50: Concurrency Control/2
Prevention
Detection
Recovery

Timestamp-
Based Partha Pratim Das
Protocols
Correctness

Module Summary Department of Computer Science and Engineering

Indian Institute of Technology, Kharagpur

ppd@[Link]

Database Management Systems Partha Pratim Das 50.1

Module Recap PPD

Module 50

Partha Pratim • Understood the locking mechanism and protocols

Das
• Realized that deadlock is a peril of locking and needs to be handled through rollback
Objectives &
Outline

Deadlock
Handling
Prevention
Detection
Recovery

Timestamp-
Based
Protocols
Correctness

Module Summary

Database Management Systems Partha Pratim Das 50.2

Module Objectives PPD

Module 50

Partha Pratim • Deadlocks are perils of locking. We need to understand how to detect, prevent and
Das
recover from deadlock
Objectives &
Outline • Introduce a simple time-based protocol that avoids deadlocks
Deadlock
Handling
Prevention
Detection
Recovery

Timestamp-
Based
Protocols
Correctness

Module Summary

Database Management Systems Partha Pratim Das 50.3

Module Outline PPD

Module 50

Partha Pratim • Deadlock Handling

Das
• Timestamp-Based Protocols
Objectives &
Outline

Deadlock
Handling
Prevention
Detection
Recovery

Timestamp-
Based
Protocols
Correctness

Module Summary

Database Management Systems Partha Pratim Das 50.4

Deadlock Handling PPD

Module 50

Partha Pratim
Das

Objectives &
Outline

Deadlock
Handling
Prevention
Detection
Recovery

Timestamp-
Based
Protocols
Correctness

Module Summary

Deadlock Handling

Database Management Systems Partha Pratim Das 50.5

Deadlock Handling

Module 50

Partha Pratim • System is deadlocked if there is a set of transactions such that every transaction in the
Das
set is waiting for another transaction in the set
Objectives &
Outline • Deadlock Prevention protocols ensure that the system will never enter into a
Deadlock deadlock state. Some prevention strategies:
Handling
Prevention ◦ Require that each transaction locks all its data items before it begins execution
Detection
Recovery (pre-declaration)
Timestamp- ◦ Impose partial ordering of all data items and require that a transaction can lock
Based
Protocols data items only in the order specified by the partial order
Correctness

Module Summary

Database Management Systems Partha Pratim Das 50.6

Deadlock Prevention

Module 50
• Transaction Timestamp: Timestamp is a unique identifier created by the DBMS to
Partha Pratim
Das
identify the relative starting time of a transaction. Timestamping is a method of
concurrency control in which each transaction is assigned a transaction timestamp
Objectives &
Outline
• Following schemes use transaction timestamps for the sake of deadlock prevention alone
Deadlock
Handling ◦ wait-die scheme: non-preemptive
Prevention
Detection . Older transaction may wait for younger one to release data item. (older means
Recovery
smaller timestamp)
Timestamp-
Based − Younger transactions never wait for older ones; they are rolled back instead
Protocols
Correctness . A transaction may die several times before acquiring needed data item
Module Summary
◦ wound-wait scheme: preemptive
. Older transaction wounds (forces rollback) of younger transaction instead of
waiting for it
− Younger transactions may wait for older ones
. May be fewer rollbacks than wait-die scheme
Database Management Systems Partha Pratim Das 50.7
Deadlock Prevention (2): Wait-Die Scheme

Module 50 • It is a non-preemptive technique for deadlock prevention

Partha Pratim
Das
• When transaction Tn requests a data item currently held by Tk , Tn is allowed to wait
only if it has a timestamp smaller than that of Tk (That is, Tn is older than Tk ),
Objectives &
Outline otherwise Tn is killed (”die”)
Deadlock
Handling • If a transaction requests to lock a resource (data item), which is already held with a
Prevention
conflicting lock by another transaction, then one of the two possibilities may occur:
Detection
Recovery ◦ Timestamp(Tn ) < Timestamp(Tk ): Tn , which is requesting a conflicting lock, is
Timestamp-
Based
older than Tk , then Tn is allowed to ”wait” until the data-item is available.
Protocols
Correctness
◦ Timestamp(Tn ) > Timestamp(Tk ): Tn is younger than Tk , then Tn is killed
Module Summary
(”dies”). Tn is restarted later with a random delay but with the same timestamp(n)
• This scheme allows the older transaction to ”wait” but kills the younger one (”die”)
• Example
◦ Suppose that transaction T5 , T10 , T15 have time-stamps 5, 10 and 15 respectively
◦ If T5 requests a data item held by T10 then T5 will ”wait”
◦ If T15 requests a data item held by T10 , then T15 will be killed (”die”)
Source: What is the difference between “wait-die” and “wound-wait” deadlock prevention algorithms?
Database Management Systems Partha Pratim Das 50.8
Deadlock Prevention (3): Wound-Wait Scheme

Module 50 • It is a preemptive technique for deadlock prevention

Partha Pratim • When transaction Tn requests a data item currently held by Tk , Tn is allowed to wait
Das
only if it has a timestamp larger than that of Tk , otherwise Tk is killed (wounded by Tn )
Objectives &
Outline • If a transaction requests to lock a resource (data item), which is already held with a
Deadlock
Handling
conflicting lock by another transaction, then one of the two possibilities may occur:
Prevention ◦ Timestamp(Tn ) < Timestamp(Tk ): Tn forces Tk to be killed (”wounds”). Tk is
Detection
Recovery restarted later with a random delay but with the same timestamp(k)
Timestamp- ◦ Timestamp(Tn ) > Timestamp(Tk ): Tn ”wait”s until the resource is free
Based
Protocols
Correctness
• This scheme allows the younger transaction requesting a lock to ”wait” if the older
Module Summary
transaction already holds a lock, but forces the younger one to be suspended (”wound”)
if the older transaction requests a lock on an item already held by the younger one
• Example
◦ Suppose that transaction T5 , T10 , T15 have time-stamps 5, 10 and 15 respectively
◦ If T5 requests a data item held by T10 , then it will be preempted from T10 and T10 will be
suspended (”wounded”)
◦ If T15 requests a data item held by T10 , then T15 will ”wait”
Source: What is the difference between “wait-die” and “wound-wait” deadlock prevention algorithms?
Database Management Systems Partha Pratim Das 50.9
Deadlock Prevention

Module 50

Partha Pratim • Both in wait-die and in wound-wait schemes, a rolled back transaction is restarted with
Das
its original timestamp. Older transactions thus have precedence over newer ones, and
Objectives &
Outline
starvation is hence avoided
Deadlock • Timeout-Based Schemes
Handling
Prevention ◦ A transaction waits for a lock only for a specified amount of time. If the lock has
Detection
Recovery not been granted within that time, the transaction is rolled back and restarted
Timestamp- ◦ Thus, deadlocks are not possible
Based
Protocols ◦ Simple to implement; but starvation is possible. Also difficult to determine good
Correctness
value of the timeout interval
Module Summary

Database Management Systems Partha Pratim Das 50.10

Deadlock Detection

Module 50

Partha Pratim • Deadlocks can be described as a wait-for graph, which consists of a pair G = (V , E ),
Das
◦ V is a set of vertices (all the transactions in the system)
Objectives &
Outline ◦ E is a set of edges; each element is an ordered pair Ti → Tj .
Deadlock
Handling
• If Ti → Tj is in E , then there is a directed edge from Ti to Tj , implying that Ti is
Prevention waiting for Tj to release a data item
Detection
Recovery • When Ti requests a data item currently being held by Tj , then the edge Ti → Tj is
Timestamp-
Based
inserted in the wait-for graph. This edge is removed only when Tj is no longer holding
Protocols a data item needed by Ti
Correctness

Module Summary • The system is in a deadlock state if and only if the wait-for graph has a cycle
• Must invoke a deadlock-detection algorithm periodically to look for cycles

Database Management Systems Partha Pratim Das 50.11

Deadlock Detection: Example

Module 50

Partha Pratim
Das

Objectives &
Outline

Timestamp-Based Protocols

Module 50

Partha Pratim • Each transaction is issued a timestamp when it enters the system. If an old transaction
Das
Ti has time-stamp TS(Ti ), a new transaction Tj is assigned time-stamp TS(Tj ) such
Objectives &
Outline
that TS(Ti ) < TS(Tj ).
Deadlock • The protocol manages concurrent execution such that the time-stamps determine the
Handling
Prevention serializability order
Detection
Recovery • In order to assure such behavior, the protocol maintains for each data Q two timestamp
Timestamp- values:
Based
Protocols ◦ W-timestamp(Q) is the largest time-stamp of any transaction that executed
Correctness

Module Summary
write(Q) successfully
◦ R-timestamp(Q) is the largest time-stamp of any transaction that executed
read(Q) successfully

Database Management Systems Partha Pratim Das 50.15

Timestamp-Based Protocols (2)

Module 50

Partha Pratim • The timestamp ordering protocol ensures that any conflicting read and write
Das
operations are executed in timestamp order
Objectives &
Outline • Suppose a transaction Ti issues a read(Q)
Deadlock
Handling
a) If TS(Ti ) ≤ W-timestamp(Q), then Ti needs to read a value of Q that was already
Prevention overwritten
Detection
Recovery ◦ Hence, the read operation is rejected, and Ti is rolled back.
Timestamp-
Based
b) If TS(Ti ) ≥ W-timestamp(Q), then the read operation is executed, and
Protocols
Correctness
R-timestamp(Q) is set to max(R-timestamp(Q), TS(Ti )).
Module Summary

Database Management Systems Partha Pratim Das 50.16

Module Summary
Slides used in this presentation are borrowed from [Link] with kind
permission of the authors.
Edited and new slides are marked with “PPD”.

Database Management Systems Partha Pratim Das 50.20

Module 51

Partha Pratim
Das

Week Recap

Objectives &
Database Management Systems
Outline
Module 51: Backup & Recovery/1: Backup/1
What is Backup
and Recovery?

Why Backup?

Backup Data:
Types Partha Pratim Das
Backup
Strategies
Full Backup Department of Computer Science and Engineering
Incremental Backup Indian Institute of Technology, Kharagpur
Differential Backup
Example ppd@[Link]
Case: Monthly
Schedule

Hot Backup
Transactional
Logging

Module Summary
Database Management Systems Partha Pratim Das 51.1
Week Recap PPD

Module 51

Partha Pratim • Concurrent transactions, serializability issues, and ACID properties are discussed
Das
• Learnt the forms of serializability - conflict and view
Week Recap

Objectives &
• Conflict serializability can be ensured by acyclic precedence graph
Outline

What is Backup
• View Serializability is a weaker serializability system providing better concurrency.
and Recovery? However, testing for view serializability is NP complete
Why Backup?
• With proper planning, a database can be recovered back to a consistent state from
Backup Data:
Types inconsistent state in the face of system failures. Such a recovery is done via cascaded or
Backup
Strategies
cascadeless rollback
Full Backup
Incremental Backup
• Understood the locking mechanism and protocols
Differential Backup
Example
• Realized that deadlock is a peril of locking and needs to be handled through rollback
Case: Monthly • Explained how to detect, prevent and recover from deadlock
Schedule

Hot Backup • Introduced a time-based protocol that avoids deadlocks

Transactional
Logging

Module Summary
Database Management Systems Partha Pratim Das 51.2
Module Objectives PPD

Module 51

Partha Pratim • To understand need for having backup

Das
• To learn about different strategies of backup and their suitability
Week Recap

Objectives &
Outline

What is Backup
and Recovery?

Why Backup?

Backup Data:
Types

Backup
Strategies
Full Backup
Incremental Backup
Differential Backup
Example

Case: Monthly
Schedule

Hot Backup
Transactional
Logging

Module Summary
Database Management Systems Partha Pratim Das 51.3
Module Outline PPD

Module 51

Partha Pratim • Need for backup and recovery

Das
• Different strategies of backup with examples
Week Recap

Objectives &
Outline

What is Backup
and Recovery?

Why Backup?

Backup Data:
Types References :
Backup
Strategies
• Enterprise Systems Backup and Recovery: A Corporate Insurance Policy by Preston De Guise (Accessed 21-Aug-2021)
Full Backup • Data Backup Recovery: The Essential Guide for Businesses (Accessed 19-Aug-2021)
Incremental Backup
Differential Backup
Example

Case: Monthly
Schedule

Hot Backup
Transactional
Logging

Module Summary
Database Management Systems Partha Pratim Das 51.4
What is Backup and Recovery? PPD

Module 51

Partha Pratim
Das

Week Recap

Objectives &
Outline

What is Backup
and Recovery?

Why Backup?

Backup Data:
Types

Backup
Strategies
Full Backup
Incremental Backup
What is Backup and Recovery?
Differential Backup
Example

Case: Monthly
Schedule

Hot Backup
Transactional
Logging

Module Summary
Database Management Systems Partha Pratim Das 51.5
What is Backup and Recovery? PPD

Module 51
• A Backup of a database is a representative copy of data containing all necessary
Partha Pratim
Das
contents of a database such as data files and control files
Week Recap
◦ Unexpected database failures, especially those due to factors beyond our control,
Objectives &
are unavoidable. Hence, it is important to keep a backup of the entire database
Outline
◦ There are two major types of backup:
What is Backup
and Recovery? ▷ Physical Backup: A copy of physical database files such as data, control files,
Why Backup? log files, and archived redo logs.
Backup Data:
Types
▷ Logical Backup: A copy of logical data that is extracted from a database
Backup
consisting of tables, procedures, views, functions, etc.
Strategies
Full Backup • Recovery is the process of restoring the database to its latest known consistent state
Incremental Backup
Differential Backup
after a system failure occurs.
Example
◦ A Database Log records all transactions in a sequence. Recovery using logs is quite
Case: Monthly
Schedule popular in databases
Hot Backup ◦ A typical log file contains information about transactions to execute, transaction
Transactional
Logging states, and modified values
Module Summary
Database Management Systems Partha Pratim Das 51.6
Why Backup? PPD

Module 51

Partha Pratim
Das

Week Recap

Objectives &
Outline

What is Backup
and Recovery?

Why Backup?

Backup Data:
Types

Backup
Strategies
Full Backup
Incremental Backup
Why Backup?
Differential Backup
Example

Case: Monthly
Schedule

Hot Backup
Transactional
Logging

Module Summary
Database Management Systems Partha Pratim Das 51.7
Why is backup necessary? PPD

Module 51 • Disaster Recovery

Partha Pratim ◦ Data loss can occur due to various reasons like hardware failures, malware attacks,
Das
environmental & physical factors or a simple human error
Week Recap • Client Side Changes
Objectives &
Outline
◦ Clients may want to modify the existing application to serve their business’s
What is Backup dynamic needs
and Recovery?
◦ Developers might need to restore a previous version of the database in order to such
Why Backup?
address such requirements
Backup Data:
Types • Auditing
Backup
Strategies
◦ From an auditing perspective, you need to know what your data or schema looked
Full Backup like at some point in the past
Incremental Backup
Differential Backup
◦ For instance, if your organization happens to get involved in a lawsuit, it may want
Example
to have a look at an earlier snapshot of the database.
Case: Monthly
Schedule • Downtime
Hot Backup ◦ Without backup, system failures lead to data loss, which in turn results in
Transactional
Logging application downtime
Module Summary ◦ This leads to bad business reputation
Database Management Systems Partha Pratim Das 51.8
Backup Data: Types PPD

Module 51

Partha Pratim
Das

Week Recap

Objectives &
Outline

What is Backup
and Recovery?

Why Backup?

Backup Data:
Types

Backup
Strategies
Full Backup
Incremental Backup
Backup Data: Types
Differential Backup
Example

Case: Monthly
Schedule

Hot Backup
Transactional
Logging

Module Summary
Database Management Systems Partha Pratim Das 51.9
Types of Backup Data PPD

Module 51

Partha Pratim • Business Data includes personal information of clients, employees, contractors etc.
Das
along with details about places, things, events and rules related to the business.
Week Recap

Objectives &
Outline • System Data includes specific environment/configuration of the system used for
What is Backup specialised development purposes, log files, software dependency data, disk images.
and Recovery?

Why Backup?

Backup Data: • Media files like photographs, videos, sounds, graphics etc. need backing up. Media
Types
files are typically much larger in size.
Backup
Strategies
Full Backup
Incremental Backup
Differential Backup
Example

Case: Monthly
Schedule

Hot Backup
Transactional
Logging

Module Summary
Database Management Systems Partha Pratim Das 51.10
Backup Strategies PPD

Module 51

Partha Pratim
Das

Week Recap

Objectives &
Outline

What is Backup
and Recovery?

Why Backup?

Backup Data:
Types

Backup
Strategies
Full Backup
Incremental Backup
Backup Strategies
Differential Backup
Example

Case: Monthly
Schedule

Hot Backup
Transactional
Logging

Module Summary
Database Management Systems Partha Pratim Das 51.11
Types of Backup Strategies: Full Backup PPD

Module 51
• Full Backup backs up everything. This is a complete copy, which stores all the objects
Partha Pratim
Das
of the database: tables, procedures, functions, views, indexes etc. Full backup can
restore all components of the database system as it was at the time of crash.
Week Recap

Objectives & • A full backup must be done at least once before any of the other type of backup
Outline

What is Backup
• The frequency of a full backup depends on the type of application. For instance, a full
and Recovery? backup is done on a daily basis for applications in which one or more of the following
Why Backup?
is/are true:
Backup Data:
Types ◦ Either 24/7 availability is not a requirement, or system availability is not affected as
Backup
Strategies
a consequence of backups.
Full Backup ◦ A complete backup takes a minimum amount of media, i.e. the backup data is not
Incremental Backup
Differential Backup
too large.
Example ◦ Backup/system administrators may not be available on a daily basis, and therefore a
Case: Monthly
Schedule
primary goal is to reduce to a bare minimum the amount of media required to
Hot Backup complete a restore.
Transactional
Logging

Module Summary
Database Management Systems Partha Pratim Das 51.12
Types of Backup Strategies: Full Backup (2) PPD

Module 51

Partha Pratim • Full Backup: Advantages

Das
◦ Recovery from a full backup involves a consolidated read from a single backup
Week Recap
◦ Generally, there will not be any dependency between two consecutive backups.
Objectives &
Outline ◦ Effectively, the loss of a single day’s backup does not affect the ability to recover
What is Backup other backups
and Recovery?

Why Backup?
◦ It is relatively easy to setup, configure and maintain
Backup Data: • Full Backup: Disadvantages
Types

Backup
◦ The backup takes largest amount of time among all types of backups
Strategies
Full Backup
◦ This results in longest system downtime during the backup process
Incremental Backup ◦ It uses largest amount of storage media per backup
Differential Backup
Example

Case: Monthly
Schedule

Hot Backup
Transactional
Logging

Module Summary
Database Management Systems Partha Pratim Das 51.13
Types of Backup Strategies: Incremental Backup PPD

Module 51
• Incremental backup targets only those files or items that have changed since the last
Partha Pratim
Das
backup. This often results in smaller backups and needs shorter duration to complete
the backup process.
Week Recap

Objectives & • For instance, a 2 TB database may only have a 5% change during the day. With
Outline
incremental database backups, the amount backed up is typically only a little more than
What is Backup
and Recovery? the actual amount of changed data in the database.
Why Backup?
• For most organizations, a full backup is done once a week, and incremental
Backup Data:
Types backups are done for the rest of the time. This might mean a backup schedule as
Backup shown below
Strategies
Full Backup
Incremental Backup
Differential Backup
Example

Case: Monthly
Schedule
• This ensures a minimum backup window during peak activity times, with a longer
Hot Backup
backup window during non-peak activity times.
Transactional
Logging

Module Summary
Database Management Systems Partha Pratim Das 51.14
Types of Backup Strategies: Incremental Backup (2) PPD

Module 51

Partha Pratim • Incremental Backup: Advantages

Das
◦ Less storage is used per backup
Week Recap
◦ The downtime due to backup is minimized
Objectives &
Outline ◦ It provides considerable cost reductions over full backups
What is Backup
and Recovery?
• Incremental Backup: Disadvantages
Why Backup? ◦ It requires more effort and time during recovery
Backup Data:
Types
◦ A complete system recovery needs a full backup to start with
Backup
◦ It cannot be done without the full backups and all incremental backups in between
Strategies
Full Backup
◦ If any of the intermediate incremental backups are lost, then the recovery cannot be
Incremental Backup 100%
Differential Backup
Example

Case: Monthly
Schedule

Hot Backup
Transactional
Logging

Module Summary
Database Management Systems Partha Pratim Das 51.15
Types of Backup Strategies: Differential Backup PPD

Module 51

Partha Pratim • Differential backup backs up all the changes that have occurred since the most recent
Das
full backup regardless of what backups have occurred in between
Week Recap
• This “rolls up” multiple changes into a single backup job which sets the basis for the
Objectives &
Outline next incremental backup
What is Backup
and Recovery?
◦ As a differential backup does not back up everything, this backup process usually
Why Backup? runs quicker than a full backup
Backup Data: ◦ The longer the age of a differential backup, the larger the size of its backup window
Types

Backup
Strategies
Full Backup
Incremental Backup
Differential Backup
Example

Case: Monthly
Schedule

Hot Backup
Transactional
Logging

Module Summary
Database Management Systems Partha Pratim Das 51.16
Types of Backup Strategies: Differential Backup (2) PPD

Module 51

Partha Pratim
• To evaluate how differential backups might work within an environment, consider the sample backup
Das schedule shown in the figure below.
Week Recap

Objectives &
Outline

What is Backup
and Recovery?
a) The incremental backup on Saturday backs up all files that have changed since the full backup on
Friday. Likewise all changes since Saturday and Sunday is backed up on Sunday and Monday’s
Why Backup?
incremental backup respectively.
Backup Data:
Types b) On Tuesday, a differential backup is performed. This backs up all files that have changed since the
Backup
full backup on Friday. A recovery on Wednesday should only require data from the full and
Strategies differential backups, skipping the Saturday/Sunday/Monday incremental backups.
Full Backup
Incremental Backup Recovery on any given day only needs the data from the full backup and the most recent differential backup
Differential Backup
Example

Case: Monthly
Schedule

Hot Backup
Transactional
Logging

Module Summary
Database Management Systems Partha Pratim Das 51.17
Types of Backup strategies: Differential Backup (3) PPD

Module 51

Partha Pratim • Differential Backup: Advantages

Das
◦ Recoveries require fewer backup sets.
Week Recap
◦ Provide better recovery options when full backups are run rarely (for example, only
Objectives &
Outline monthly)
What is Backup
and Recovery?
• Differential Backup: Disadvantages
Why Backup? ◦ Although the number of backup sets required for recovery is less but in differential
Backup Data:
Types
backups the amount of storage media required may exceed the storage media
Backup
required for incremental backups
Strategies
Full Backup
◦ If done after quite a long time, differential backups can even reach the size of a full
Incremental Backup backup
Differential Backup
Example

Case: Monthly
Schedule

Hot Backup
Transactional
Logging

Module Summary
Database Management Systems Partha Pratim Das 51.18
Types of Backup Strategies: Illustrative Example PPD

Module 51

Partha Pratim • The figure below depicts which of the updated files of the database will be backed up in
Das
each respective type of backup throughout a span of 5 days as indicated.
Week Recap

Objectives &
Outline

What is Backup
and Recovery?

Why Backup?

Backup Data:
Types

Backup
Strategies
Full Backup
Incremental Backup
Differential Backup Figure: Backup Types
Example

Case: Monthly
Schedule

Hot Backup
Transactional
Logging

Module Summary
Database Management Systems Partha Pratim Das 51.19
Case: Monthly Schedule PPD

Module 51

Partha Pratim
Das

Week Recap

Objectives &
Outline

What is Backup
and Recovery?

Why Backup?

Backup Data:
Types

Backup
Strategies
Full Backup
Incremental Backup
Case: Monthly Schedule
Differential Backup
Example

Case: Monthly
Schedule

Hot Backup
Transactional
Logging

Module Summary
Database Management Systems Partha Pratim Das 51.20
Case: Monthly Data Backup Schedule PPD

Module 51 Consider the following backup schedule for a month:

Partha Pratim
Das

Week Recap

Objectives &
Outline

What is Backup
and Recovery?
• Inference
Why Backup? ◦ Here full backups are performed once per month, but with differentials being performed weekly, the
Backup Data: maximum number of backups required for a complete system recovery at any point will be one full
Types backup, one differential backup, and six incremental backups
Backup ◦ A full system recovery will never need more than the full backup from the start of the month, the
Strategies
Full Backup
differential backup at the start of the relevant week, and the incremental backups performed during
Incremental Backup the week
Differential Backup ◦ If a policy were used whereby full backups were done on the first of the month, and incrementals
Example
for the rest of the month, a complete system recovery on last day of month will need as many as
Case: Monthly 31 backup sets
Schedule
◦ Thus differential backups can improve efficiency of recovery when planned properly
Hot Backup
Transactional
Logging

Module Summary
Database Management Systems Partha Pratim Das 51.21
Hot Backup PPD

Module 51

Partha Pratim
Das

Week Recap

Objectives &
Outline

What is Backup
and Recovery?

Why Backup?

Backup Data:
Types

Backup
Strategies
Full Backup
Incremental Backup
Hot Backup
Differential Backup
Example

Case: Monthly
Schedule

Hot Backup
Transactional
Logging

Module Summary
Database Management Systems Partha Pratim Das 51.22
Hot Backup PPD

Module 51
• Till now we have learnt about backup strategies which can not happen simultaneously
Partha Pratim
Das
with a running application
Week Recap
• In systems where high availability is a requirement Hot backup is preferable wherever
Objectives & possible
Outline

What is Backup
• Hot backup refers to keeping a database up and running while the backup is
and Recovery? performed concurrently
Why Backup?
◦ Such a system usually has a module or plug-in that allows the database to be
Backup Data:
Types backed up while staying available to end users
Backup
Strategies
◦ Databases which stores transactions of asset management companies, hedge funds,
Full Backup high frequency trading companies etc. try to implement Hot backups as these data
Incremental Backup
Differential Backup
are highly dynamic and the operations run 24x7
Example ◦ Real time systems like sensor and actuator data in embedded devices, satellite
Case: Monthly
Schedule
transmissions etc. also use Hot backup
Hot Backup
Transactional
Logging

Module Summary
Database Management Systems Partha Pratim Das 51.23
Hot Backup (2) PPD

Module 51

Partha Pratim • Hot Backup: Advantages

Das
◦ The database is always available to the end user.
Week Recap
◦ Point-in-time recovery is easier to achieve in Hot backup systems.
Objectives &
Outline ◦ Most efficient while dealing with dynamic and modularized data.
What is Backup
and Recovery?
• Hot Backup: Disadvantages
Why Backup?

Backup Data:
◦ May not be feasible when the data set is huge and monolithic.
Types
◦ Fault tolerance is less. Occurrence of any error on the fly can terminate the whole
Backup
Strategies backup process.
Full Backup
Incremental Backup
◦ Maintenance and setup cost is high.
Differential Backup
Example

Case: Monthly
Schedule

Hot Backup
Transactional
Logging

Module Summary
Database Management Systems Partha Pratim Das 51.24
Transactional Logging as Hot Backup PPD

Module 51

Partha Pratim • In regular database systems, hot backup is mainly used for Transaction Log Backup.
Das
• Cold backup strategies like Differential, Incremental are preferred for Data backup.
Week Recap
The reason is evident from the disadvantages of Hot backup.
Objectives &
Outline
• Transactional Logging is used in circumstances where a possibly inconsistent backup
What is Backup
and Recovery? is taken, but another file generated and backed up (after the database file has been
Why Backup? fully backed up) can be used to restore consistency.
Backup Data:
Types • The information regarding data backup versions while recovery at a given point can
Backup be inferred from the Transactional Log backup set.
Strategies
Full Backup • Thus they play a vital role in database recovery.
Incremental Backup
Differential Backup
Example

Case: Monthly
Schedule

Hot Backup
Transactional
Logging

Module Summary
Database Management Systems Partha Pratim Das 51.25
Module Summary PPD

Module 51

Partha Pratim • Learnt why having backup is essential

Das
• Analysed different backup strategies and respective schedules
Week Recap

Objectives &
• Learnt how Hot backup of transaction log helps in recovering consistent database
Outline

What is Backup
and Recovery?

Why Backup?

Backup Data:
Types

Backup
Strategies
Full Backup
Incremental Backup
Differential Backup
Example Slides used in this presentation are borrowed from [Link] with kind
Case: Monthly
Schedule
permission of the authors.
Hot Backup
Edited and new slides are marked with “PPD”.
Transactional
Logging

Module Summary
Database Management Systems Partha Pratim Das 51.26
Module 52

Partha Pratim
Das

Objectives &
Outline Database Management Systems
Failure
Classification Module 52: Backup & Recovery/2: Recovery/1
Storage Structure
Implementation
Data Access

Log-Based
Recovery
Partha Pratim Das
Database
Modification
Undo and Redo Department of Computer Science and Engineering
Example Indian Institute of Technology, Kharagpur
Checkpoints

Module Summary ppd@[Link]

Database Management Systems Partha Pratim Das 52.1

Module Recap PPD

Module 52

Partha Pratim • Learnt why having backup is essential

Das
• Analysed different backup strategies and respective schedules
Objectives &
Outline • Learnt how Hot backup of transaction log helps in recovering consistent database
Failure
Classification

Storage Structure
Implementation
Data Access

Log-Based
Recovery
Database
Modification
Undo and Redo
Example
Checkpoints

Module Summary

Database Management Systems Partha Pratim Das 52.2

Module Objectives PPD

Module 52

Partha Pratim • We need to understand what are the possible sources for failure for transactions in a
Das
database
Objectives &
Outline • Various types of storages are used for recovery from failures to ensure Atomicity,
Failure Consistency and Durability – these models need to be explored
Classification

Storage Structure • To understand recovery scheme based on logging

Implementation
Data Access • To focus on single transactions only
Log-Based
Recovery
Database
Modification
Undo and Redo
Example
Checkpoints

Module Summary

Database Management Systems Partha Pratim Das 52.3

Module Outline PPD

Module 52

Partha Pratim • Failure Classification

Das
• Storage Structure
Objectives &
Outline • Log-Based Recovery
Failure
Classification

Storage Structure
Implementation
Data Access

Log-Based
Recovery
Database
Modification
Undo and Redo
Example
Checkpoints

Module Summary

Database Management Systems Partha Pratim Das 52.4

Failure Classification PPD

Module 52

Partha Pratim
Das

Objectives &
Outline

Failure
Classification

Storage Structure
Implementation
Data Access

Log-Based
Recovery
Database
Modification
Undo and Redo
Example
Failure Classification
Checkpoints

Module Summary

Database Management Systems Partha Pratim Das 52.5

Database System Recovery PPD

Module 52
• All database reads/writes are within a transaction
Partha Pratim
Das • Transactions have the “ACID” properties
Objectives & ◦ Atomicity - all or nothing
Outline

Failure
◦ Consistency - preserves database integrity
Classification ◦ Isolation - execute as if they were run alone
Storage Structure ◦ Durability - results are not lost by a failure
Implementation
Data Access
• Concurrency Control guarantees I, contributes to C
Log-Based
Recovery • Application program guarantees C
Database
Modification
Undo and Redo
• Recovery subsystem guarantees A & D, contributes to C
Example
Checkpoints

Module Summary

Database Management Systems Partha Pratim Das 52.6

Failure Classification

Module 52
• Transaction failure:
Partha Pratim
Das ◦ Logical errors: transaction cannot complete due to some internal error condition
Objectives &
◦ System errors: the database system must terminate an active transaction due to
Outline
an error condition (for example, deadlock)
Failure
Classification • System crash: a power failure or other hardware or software failure causes the system
Storage Structure to crash
Implementation
Data Access ◦ Fail-stop assumption: non-volatile storage contents are assumed to not be
Log-Based
Recovery
corrupted as result of a system crash
Database
Modification ▷ Database systems have numerous integrity checks to prevent corruption of disk
Undo and Redo
Example
data
Checkpoints
• Disk failure: a head crash or similar disk failure destroys all or part of disk storage
Module Summary
◦ Destruction is assumed to be detectable
▷ Disk drives use checksums to detect failures

Database Management Systems Partha Pratim Das 52.7

Recovery Algorithms

Module 52

Partha Pratim • Consider transaction Ti that transfers $50 from account A to account B
Das
◦ Two updates: subtract 50 from A and add 50 to B
Objectives &
Outline • Transaction Ti requires updates to A and B to be output to the database
Failure
Classification ◦ A failure may occur after one of these modifications have been made but before
Storage Structure both of them are made
Implementation
Data Access
◦ Modifying the database without ensuring that the transaction will commit may
Log-Based leave the database in an inconsistent state
Recovery
Database
◦ Not modifying the database may result in lost updates if failure occurs just after
Modification
Undo and Redo
transaction commits
Example
Checkpoints
• Recovery algorithms have two parts
Module Summary a) Actions taken during normal transaction processing to ensure enough information
exists to recover from failures
b) Actions taken after a failure to recover the database contents to a state that
ensures atomicity, consistency and durability
Database Management Systems Partha Pratim Das 52.8
Storage Structure PPD

Module 52

Partha Pratim
Das

Objectives &
Outline

Failure
Classification

Data Access (2)

Module 52
• Each transaction Ti has its private work-area in which local copies of all data items
Partha Pratim
Das accessed and updated by it are kept
Objectives &
◦ Ti ’s local copy of a data item X is denoted by xi
Outline
◦ BX denotes block containing X
Failure
Classification • Transferring data items between system buffer blocks and its private work-area done by:
Storage Structure
Implementation
◦ read(X) assigns the value of data item X to the local variable xi
Data Access ◦ write(X) assigns the value of local variable xi to data item X in the buffer block
Log-Based
Recovery • Transactions
Database
Modification ◦ Must perform read(X) before accessing X for the first time (subsequent reads can
Undo and Redo
Example be from local copy)
Checkpoints
◦ The write(X) can be executed at any time before the transaction commits
Module Summary
• Note that output(BX ) need not immediately follow write(X). System can perform the
output operation when it deems fit

Database Management Systems Partha Pratim Das 52.14

Data Access (3): Example

Module 52

Partha Pratim
Das

Objectives &
Outline

Failure
Classification

Storage Structure
Implementation
Data Access

Log-Based
Recovery
Database
Modification
Undo and Redo
Example
Checkpoints

Module Summary

Database Management Systems Partha Pratim Das 52.15

Recovery and Atomicity

Module 52
• To ensure atomicity despite failures, we first output information describing the
Partha Pratim
Das modifications to stable storage without modifying the database itself
Objectives & • We study Log-based Recovery Mechanisms
Outline

Failure
◦ We first present key concepts
Classification ◦ And then present the actual recovery algorithm
Storage Structure
Implementation
• Less used alternative: Shadow Paging
Data Access
• In this Module we assume serial execution of transactions
Log-Based
Recovery
Database
• In the next Module, we consider the case of concurrent transaction execution
Modification
Undo and Redo
Example
Checkpoints

Module Summary

Database Management Systems Partha Pratim Das 52.16

Log-Based Recovery PPD

Module 52

Partha Pratim
Das

Objectives &
Outline

Failure
Classification

Storage Structure
Implementation
Data Access

Log-Based
Recovery
Database
Modification
Undo and Redo
Example
Log-Based Recovery
Checkpoints

Module Summary

Database Management Systems Partha Pratim Das 52.17

Undo and Redo on Recovering from Failure

Module 52
• When recovering after failure:
Partha Pratim
Das ◦ Transaction Ti needs to be undone if the log
Objectives & ▷ contains the record < Ti start>,
Outline
▷ but does not contain either the record < Ti commit> or < Ti abort>
Failure
Classification ◦ Transaction Ti needs to be redone if the log
Storage Structure
Implementation
▷ contains the records < Ti start>
Data Access
▷ and contains the record < Ti commit > or < Ti abort >
Log-Based
Recovery ◦ It may seem strange to redo transaction Ti if the record < Ti abort> record is in
Database
Modification the log
Undo and Redo
Example ▷ To see why this works, note that if < Ti abort> is in the log, so are the
Checkpoints
redo-only records written by the undo operation. Thus, the end result will be to
Module Summary
undo Ti ’s modifications in this case. This slight redundancy simplifies the
recovery algorithm and enables faster overall recovery time
▷ such a redo redoes all the original actions including the steps that restored old
value – Known as Repeating History
Database Management Systems Partha Pratim Das 52.25
Immediate Modification Recovery Example

Module 52 Below we show the log as it appears at three instances of time.

Partha Pratim
Das

Objectives &
Outline

Failure
Classification

Storage Structure
Implementation
Data Access

Log-Based
Recovery
Database Recovery actions in each case above are:
Modification
Undo and Redo (a) undo (T0 ): B is restored to 2000 and A to 1000, and log records < T0 , B, 2000 >,
Example
Checkpoints < T0 , A, 1000 >, < T0 , abort> are written out
Module Summary (b) redo (T0 ) and undo (T1 ): A and B are set to 950 and 2050 and C is restored to 700.
Log records < T1 , C, 700 >, < T1 , abort> are written out
(c) redo (T0 ) and redo (T1 ): A and B are set to 950 and 2050 respectively. Then C is set
to 600.
Database Management Systems Partha Pratim Das 52.26
Checkpoints

Module 52
• Redoing/undoing all transactions recorded in the log can be very slow
Partha Pratim
Das ◦ Processing the entire log is time-consuming if the system has run for a long time
Objectives &
◦ We might unnecessarily redo transactions which have already output their updates
Outline
to the database
Failure
Classification • Streamline recovery procedure by periodically performing checkpointing
Storage Structure
Implementation
• All updates are stopped while doing checkpointing
Data Access
a) Output all log records currently residing in main memory onto stable storage
Log-Based
Recovery b) Output all modified buffer blocks to the disk
Database
Modification c) Write a log record < checkpoint L > onto stable storage where L is a list of all
Undo and Redo
Example
transactions active at the time of checkpoint
Checkpoints

Module Summary

Database Management Systems Partha Pratim Das 52.27

Checkpoints (2)

Module 52
• During recovery we need to consider only the most recent transaction Ti that started
Partha Pratim
Das before the checkpoint, and transactions that started after Ti
Objectives &
◦ Scan backwards from end of log to find the most recent <checkpoint L > record
Outline
◦ Only transactions that are in L or started after the checkpoint need to be redone or
Failure
Classification undone
Storage Structure ◦ Transactions that committed or aborted before the checkpoint already have all their
Implementation
Data Access
updates output to stable storage
Log-Based • Some earlier part of the log may be needed for undo operations
Recovery
Database
Modification
◦ Continue scanning backwards till a record < Ti start> is found for every
Undo and Redo transaction Ti in L
Example
Checkpoints ◦ Parts of log prior to earliest < Ti start> record above are not needed for recovery,
Module Summary and can be erased whenever desired

Database Management Systems Partha Pratim Das 52.28

Checkpoints (3): Example

Module 52

Partha Pratim
Das

Objectives &
Outline

Failure
Classification

Storage Structure
Implementation
Data Access

Log-Based
Recovery • Any transactions that committed before the last checkpoint should be ignored
Database
Modification
Undo and Redo
◦ T1 can be ignored (updates already output to disk due to checkpoint)
Example
Checkpoints
• Any transactions that committed since the last checkpoint need to be redone
Module Summary ◦ T2 and T3 redone
• Any transaction that was running at the time of failure needs to be undone and
restarted
◦ T4 undone
Database Management Systems Partha Pratim Das 52.29
Module Summary

Module 52

Partha Pratim • Failures may be due to variety of sources – each needs a strategy for handling
Das
• A proper mix and management of volatile, non-volatile and stable storage can
Objectives &
Outline guarantee recovery from failures and ensure Atomicity, Consistency and Durability
Failure
Classification
• Log-based recovery is efficient and effective
Storage Structure
Implementation
Data Access

Log-Based
Recovery
Database
Modification
Undo and Redo
Example Slides used in this presentation are borrowed from [Link] with kind
Checkpoints
permission of the authors.
Module Summary
Edited and new slides are marked with “PPD”.

Database Management Systems Partha Pratim Das 52.30

Module 53

Partha Pratim
Das

Objectives &
Outline Database Management Systems
Transactional
Logging Module 53: Backup & Recovery/3: Recovery/2
Hot Backup
Example

Recovery
Algorithm
Data Access Partha Pratim Das
Checkpoint
Redo Phase
Undo Phase Department of Computer Science and Engineering
Example Indian Institute of Technology, Kharagpur
Module Summary
ppd@[Link]

Database Management Systems Partha Pratim Das 53.1

Module Recap PPD

Objectives &
Outline

Transactional
Logging
Hot Backup
Example

Recovery
Algorithm
Data Access
Checkpoint
Redo Phase
Undo Phase
Example
Transactional Logging
Module Summary

Database Management Systems Partha Pratim Das 53.5

Hot Backup: Recap PPD

Module 53
• In systems where high availability is a requirement Hot backup is preferable wherever
Partha Pratim
Das
possible
Objectives &
• Hot backup refers to keeping a database up and running while the backup is
Outline
performed concurrently
Transactional
Logging ◦ Such a system usually has a module or plug-in that allows the database to be
Hot Backup
Example
backed up while staying available to end users
Recovery ◦ Databases which stores transactions of asset management companies, hedge funds,
Algorithm
Data Access
high frequency trading companies etc. try to implement Hot backups as these data
Checkpoint are highly dynamic and the operations run 24x7
Redo Phase
Undo Phase ◦ Real time systems like sensor and actuator data in embedded devices, satellite
Example
transmissions etc. also use Hot backup
Module Summary

Database Management Systems Partha Pratim Das 53.6

Transactional Logging as Hot Backup PPD

Module 53

Partha Pratim • In regular database systems, Hot Backup is mainly used for Transaction Log Backup
Das
• Cold backup strategies like Differential, Incremental are preferred for Data backup
Objectives &
Outline The reason is evident from the disadvantages of Hot backup
Transactional
Logging
• Transactional Logging is used in circumstances where a possibly inconsistent backup
Hot Backup is taken, but another file generated and backed up (after the database file has been
Example

Recovery
fully backed up) can be used to restore consistency
Algorithm
Data Access
• The information regarding data backup versions while recovery at a given point can
Checkpoint be inferred from the Transactional Log backup set
Redo Phase
Undo Phase
• Thus they play a vital role in database recovery
Example

Module Summary

Database Management Systems Partha Pratim Das 53.7

Transactional Logging with Recovery: Example PPD

Module 53 To understand how Transactional Logging works we consider Figure 1 that represents a chunk of a
Partha Pratim
database just before a backup has been started
Das

Objectives &
• While the backup is in progress, modifications may continue
Outline to occur to the database. For example, a request to modify
Transactional the data at location “4325” to ‘0’ arrives.
Logging
Hot Backup
• When a request comes through to modify a part of the DB, the
Example modifications will be written in the given order compulsorily
Recovery Figure: 1: Database content 1 Transaction Log
Algorithm 2 Database (itself)
Data Access
Checkpoint This is depicted in Figure 2
Redo Phase
Undo Phase • If a crash occurs before writing to the database then the
Example inconsistent backed up file is recovered first, and then the
Module Summary pending modifications in the transaction log (backed up*)
are applied to re-establish consistency
*Note: The Transactional Log itself is backed up using Hot
Backup the Data is backed up incrementally

Figure: 2: Changes to a DB during a hot backup

Database Management Systems Partha Pratim Das 53.8
Transactional Logging with Recovery: Example (2) PPD

Module 53 Consider in the previous scenario before the occurrence of crash, another request modifies the content of
location “4321” to ‘0’. Incidentally, this change gets written in the database itself (recall: Immediate
Partha Pratim
Das Modification). This is indicated in Figure 3
Objectives &
• Figure 3 is the state of the database after which the system
Outline crashes. Note that this part has already been backed up, and
Transactional hence, the backup is inconsistent with the database.
Logging
Hot Backup • Recovery Phase:
Example
◦ Data recovery is done from the last data back up set (Fig-
Recovery
Algorithm ure 1)
Data Access ◦ Log recovery is done from the Transaction Log backup set.
Checkpoint
Figure: 3: Applying Tr. logs during recovery It will be same as the current transaction log because of
Redo Phase
Undo Phase
Hot backup
Example ◦ Figure 4 shows the recovered database and log
Module Summary • The recovered database is inconsistent. To re-establish con-
sistency all transaction logs generated between the start of
the backup and the end of the backup must be replayed

Figure: 4: Recovered DB files and Tr. logs

Database Management Systems Partha Pratim Das 53.9
Transactional Logging with Recovery: Example (3) PPD

Module 53 • When using transactional logging we distinguish between recover and restore:
Partha Pratim
Das
◦ Recover: retrieve from the backup media the database files and transaction logs, and
◦ Restore: reapply database consistency based on the transaction logs
Objectives &
Outline • For our restore process, we recover inconsistent database files and completed transaction logs. The
Transactional
recovered files will resemble the configuration shown in Figure 4
Logging
Hot Backup
• The final database state after replaying log on the recovered database is displayed in Figure 5
Example
• The state of database is consistent
Recovery
Algorithm
Data Access
• Note that an unnecessary log replay is shown occurring
Checkpoint for block 4325. Whether such replays will occur is de-
Redo Phase pendent on the database being used. For instance, a
Undo Phase
Example
database vendor might choose to replay all logs because
it would be faster than first determining whether a par-
Module Summary
ticular logged activity needs to be replayed
• Once all transaction logs have been replayed, the
database is said to have been restored, that is, it is at a
Figure: 5: Database restore process via log replay point where it can now be opened for user access

Database Management Systems Partha Pratim Das 53.10

Recovery Algorithm PPD

Module 53

Partha Pratim
Das

Objectives &
Outline

Transactional
Logging
Hot Backup
Example

Recovery
Algorithm
Data Access
Checkpoint
Redo Phase
Undo Phase
Example
Recovery Algorithm
Module Summary

Database Management Systems Partha Pratim Das 53.11

Recovery Schemes

Module 53

Partha Pratim • So far:

Module 53

Partha Pratim • Logging (during normal operation):

Das
◦ < Ti start > at transaction start
Objectives &
Outline ◦ < Ti , Xj , V1 , V2 > for each update, and
Transactional ◦ < Ti commit> at transaction end
Logging
Hot Backup
Example

Recovery
Algorithm
Data Access
Checkpoint
Redo Phase
Undo Phase
Example

Module Summary

Database Management Systems Partha Pratim Das 53.16

Recovery Algorithm (2)

Module 53

Partha Pratim • Transaction rollback (during normal operation)

Das
◦ Let Ti be the transaction to be rolled back
Objectives &
Outline ◦ Scan log backwards from the end, and for each log record of Ti of the form
Transactional < Ti , Xj , V1 , V2 >
Logging
Hot Backup ▷ perform the undo by writing V1 to Xj ,
Example
▷ write a log record < Ti , Xj , V1 >
Recovery
Algorithm . . . such log records are called Compensation Log Records (CLR)
Data Access
Checkpoint ◦ Once the record < Ti start> is found stop the scan and write the log record < Ti
Redo Phase
Undo Phase
abort>
Example

Module Summary

Database Management Systems Partha Pratim Das 53.17

Recovery Algorithm (3): Checkpoints Recap

Module 53
• Let the time of checkpointing is tcheck and the time of system crash is tfail
Partha Pratim
• Let there be four transactions Ta , Tb , Tc and Td such that:
Das
◦ Ta commits before checkpoint
Objectives & ◦ Tb starts before checkpoint and commits before system crash
Outline
◦ Tc starts after checkpoint and commits before system crash
Transactional
Logging
◦ Td starts after checkpoint and was active at the time of system crash
Hot Backup
Example

Recovery
Algorithm
Data Access
Checkpoint
Redo Phase
Undo Phase
Example

Module Summary
• The actions that are taken by the recovery manager are:
◦ Nothing is done with Ta
◦ Transaction redo is performed for Tb and Tc
◦ Transaction undo is performed for Td
Source: Distributed DBMS - Database Recovery
Database Management Systems Partha Pratim Das 53.18
Recovery Algorithm (4): Checkpoints Recap

Module 53

Partha Pratim
Das

Objectives &
Outline

Transactional
Logging
Hot Backup
Example

Recovery
Algorithm
Data Access
Checkpoint
• Any transactions that committed before the last checkpoint should be ignored
Redo Phase
Undo Phase
◦ T1 can be ignored (updates already output to disk due to checkpoint)
Example
• Any transactions that committed since the last checkpoint need to be redone
Module Summary
◦ T2 and T3 redone
• Any transaction that was running at the time of failure needs to be undone and
restarted
◦ T4 undone
Database Management Systems Partha Pratim Das 53.19
Recovery Algorithm (5): Redo-Undo Phases

Module 53
• Recovery from failure: Two phases
Partha Pratim
Das ◦ Redo phase: Replay updates of all transactions, whether they committed, aborted,
Objectives &
or are incomplete
Outline
◦ Undo phase: Undo phase: Undo all incomplete transactions
Transactional
Logging
Hot Backup
Example
Requirement:
Recovery • Transactions of type T1 need no recovery
Algorithm
Data Access • Transactions of type T2 or T4 need to be re-
Checkpoint done
Redo Phase
Undo Phase • Transactions of type T3 or T5 need to be un-
Example done and restarted
Module Summary
Strategy:
• Ignore T1
• Redo T2 , T3 , T4 and T5
• Undo T3 and T5
Database Management Systems Partha Pratim Das 53.20
Recovery Algorithm (6): Redo Phase

Module 53 • Find last < checkpoint L> record, and set undo-list to L
Partha Pratim
Das
• Scan forward from above < checkpoint L> record
◦ Whenever a record < Ti , Xj , V1 , V2 > is found, redo it by writing V2 to Xj
Objectives &
Outline ◦ Whenever a log record < Ti start> is found, add Ti to undo-list
Transactional
◦ Whenever a log record < Ti commit> or < Ti abort> is found, remove Ti from undo-list
Logging
Hot Backup
• Steps for the REDO operation are:
Example
◦ If the transaction has done INSERT, the recovery manager generates an insert from the log
Recovery
Algorithm
◦ If the transaction has done DELETE, the recovery manager generates a delete from the log
Data Access
◦ If the transaction has done UPDATE, the recovery manager generates an update from the log.
Checkpoint
Redo Phase
Undo Phase
Example

Module Summary

Source: Distributed DBMS - Database Recovery

Database Management Systems Partha Pratim Das 53.21

Recovery Algorithm (7): Undo Phase

Module 53 • Scan log backwards from end

Partha Pratim
Das
◦ Whenever a log record < Ti , Xj , V1 , V2 > is found where Ti is in undo-list perform same actions as
for transaction rollback:
Objectives &
Outline
▷ Perform undo by writing V1 to Xj
▷ Write a log record < Ti , Xj , V1 >
Transactional
Logging ◦ Whenever a log record < Ti start> is found where Ti is in undo-list
Hot Backup
Example
▷ Write a log record < Ti abort>
Recovery
▷ Remove Ti from undo-list
Algorithm ◦ Stop when undo-list is empty
Data Access
That is, < Ti start > has been found for every transaction in undo-list
Checkpoint
Redo Phase • Steps for the UNDO operation are:
Undo Phase
Example ◦ If the faulty transaction has done INSERT, the recovery manager deletes the data item(s) inserted
Module Summary ◦ If the faulty transaction has done DELETE, the recovery manager inserts the deleted data item(s)
from the log
◦ If the faulty transaction has done UPDATE, the recovery manager eliminates the value by writing
the before-update value from the log
• After undo phase completes, normal transaction processing can commence
Source: Distributed DBMS - Database Recovery
Database Management Systems Partha Pratim Das 53.22
Recovery Algorithm (8): Example

Module 53

Partha Pratim
Das

Objectives &
Outline

Transactional
Logging
Hot Backup
Example

Recovery
Algorithm
Data Access
Checkpoint
Redo Phase
Undo Phase
Example

Module Summary

Database Management Systems Partha Pratim Das 53.23

Module Summary

Module 53

Partha Pratim • Learnt how Hot backup of transaction log helps in recovering consistent database.
Das
• Studied the recovery algorithms for concurrent transactions
Objectives &
Outline

Transactional
Logging
Hot Backup
Example

Recovery
Algorithm
Data Access
Checkpoint
Redo Phase
Undo Phase
Example
Slides used in this presentation are borrowed from [Link] with kind
Module Summary
permission of the authors.
Edited and new slides are marked with “PPD”.

Database Management Systems Partha Pratim Das 53.24

Module 54

Partha Pratim
Das

Objectives &
Outline Database Management Systems
Recovery with
Early Lock Module 54: Backup & Recovery/4: Recovery/3
Release
Operation Logging
Transaction Rollback
Failure Recovery
Recovery Algorithm
Partha Pratim Das
Plan for Backup
and Recovery

Module Summary Department of Computer Science and Engineering

Indian Institute of Technology, Kharagpur

ppd@[Link]

Database Management Systems Partha Pratim Das 54.1

Module 54 • Support for high-concurrency locking techniques, such as those used for B + -tree
Partha Pratim
Das
concurrency control, which release locks early
◦ Supports “logical undo”
Objectives &
Outline
• Recovery based on “repeating history”, whereby recovery executes exactly the same
Recovery with
Early Lock actions as normal processing
Release
Operation Logging ◦ including redo of log records of incomplete transactions, followed by subsequent
Transaction Rollback
Failure Recovery
undo
Recovery Algorithm ◦ Key benefits
Plan for Backup
and Recovery ▷ supports logical undo
Module Summary ▷ easier to understand/show correctness
• Early lock release is important not only for indices, but also for operations on other
system data structures that are accessed and updated very frequently like:
◦ data structures that track the blocks containing records of a relation
◦ the free space in a block
◦ the free blocks
Database Management Systems Partha Pratim Das 54.7
Logical Undo Logging

Module 54
• Operations like B + -tree insertions and deletions release locks early
Partha Pratim
Das ◦ They cannot be undone by restoring old values (physical undo), since once a lock
Objectives &
is released, other transactions may have updated the B + -tree
Outline
◦ Instead, insertions (deletions) are undone by executing a deletion (insertion)
Recovery with
Early Lock operation (known as logical undo)
Release
Operation Logging • For such operations, undo log records should contain the undo operation to be executed
Transaction Rollback
Failure Recovery ◦ Such logging is called logical undo logging, in contrast to physical undo logging
Recovery Algorithm

Plan for Backup

▷ Operations are called logical operations
and Recovery
◦ Other examples:
Module Summary
▷ delete of tuple, to undo insert of tuple
− allows early lock release on space allocation information
▷ subtract amount deposited, to undo deposit
− allows early lock release on bank balance

Database Management Systems Partha Pratim Das 54.8

Physical Redo

Module 54

Partha Pratim • Redo information is logged physically (that is, new value for each write) even for
Das
operations with logical undo
Objectives &
Outline ◦ Logical redo is very complicated since database state on disk may not be “operation
Recovery with consistent” when recovery starts
Early Lock
Release ◦ Physical redo logging does not conflict with early lock release
Operation Logging
Transaction Rollback
Failure Recovery
Recovery Algorithm

Plan for Backup

and Recovery

Module Summary

Database Management Systems Partha Pratim Das 54.9

Operation Logging: Process

Module 54 • When operation starts, log < Ti , Oj , operation-begin >. Here Oj is a unique identifier
Partha Pratim
Das
of the operation instance
Objectives &
• While the system is executing the operation, it creates update log records in the normal
Outline fashion for all updates performed by the operation
Recovery with
Early Lock ◦ the usual old-value (physical undo information) and new-value (physical redo
Release
Operation Logging
information) is written out as usual for each update performed by the operation;
Transaction Rollback ◦ the old-value information is required in case the transaction needs to be rolled back
Failure Recovery
Recovery Algorithm before the operation completes
Plan for Backup
and Recovery
• When operation completes, < Ti , Oj , operation-end, U > is logged, where U contains
Module Summary information needed to perform a logical undo information
◦ For example, if the operation inserted an entry in a B+ -tree, the undo information
U would indicate that a deletion operation is to be performed, and would identify
the B+ -tree and what entry to delete from the tree. This is called logical logging
◦ In contrast, logging of old-value and new-value information is called physical
logging, and the corresponding log records are called physical log records
Database Management Systems Partha Pratim Das 54.10
Operation Logging (2): Example

Module 54

Partha Pratim • Insert of (key, record-id) pair (K5, RID7) into index I9
Das

Objectives &
Outline

Recovery with
Early Lock
Release
Operation Logging
Transaction Rollback
Failure Recovery
Recovery Algorithm

Plan for Backup

and Recovery

Module Summary

Database Management Systems Partha Pratim Das 54.11

Operation Logging (3)

Module 54
• If crash/rollback occurs before operation completes:
Partha Pratim
Das ◦ the operation-end log record is not found, and
Objectives &
◦ the physical undo information is used to undo operation
Outline

Recovery with
• If crash/rollback occurs after the operation completes:
Early Lock
Release
◦ the operation-end log record is found, and in this case
Operation Logging ◦ logical undo is performed using U; the physical undo information for the operation
Transaction Rollback
Failure Recovery is ignored
Recovery Algorithm

Plan for Backup

• Redo of operation (after crash) still uses physical redo information
and Recovery

Module Summary

Database Management Systems Partha Pratim Das 54.12

Transaction Rollback with Logical Undo

Module 54 Rollback of transaction Ti , scan the log backwards

Partha Pratim a) If a log record < Ti , X , V1 , V2 > is found, perform the undo and log < Ti , X , V1 >
Das
b) If a < Ti , Oj , operation-end, U > record is found
Objectives &
Outline • Rollback the operation logically using the undo information U
Recovery with
Early Lock
◦ Updates performed during roll back are logged just like during normal operation execution
Release ◦ At the end of the operation rollback, instead of logging an operation-end record, generate a
Operation Logging record < Ti , Oj , operation-abort >
Transaction Rollback
Failure Recovery
• Skip all preceding log records for Ti until the record < Ti , Oj operation-begin> is found
Recovery Algorithm
c) If a redo-only record is found ignore it
Plan for Backup
and Recovery
d) If a < Ti , Oj , operation-abort > record is found: skip all preceding log records for Ti until the record
Module Summary < Ti , Oj , operation-begin >is found
e) Stop the scan when the record < Ti , start> is found
f) Add a < Ti , abort> record to the log
Note:
• Cases c) and d) above can occur only if the database crashes while a transaction is being rolled back
• Skipping of log records as in case d) is important to prevent multiple rollback of the same operation
Database Management Systems Partha Pratim Das 54.13
Transaction Rollback with Logical Undo

Module 54

Partha Pratim
Das

Objectives &
Outline

Recovery with
Early Lock
Release
Operation Logging
Transaction Rollback
Failure Recovery
Recovery Algorithm

Plan for Backup and Recover (2) PPD

Module 54

Partha Pratim • Equipment

Das
◦ Do you have necessary equipment to make backups? To perform timely backups
Objectives &
Outline and recoveries, you need to have proper software and hardware resources.
Recovery with
Early Lock
• Employees
Release
Operation Logging
◦ Who will be responsible for implementing your database backup and recovery plan?
Transaction Rollback Ideally, one person should be appointed for controlling and supervising the plan, and
Failure Recovery
Recovery Algorithm several IT specialists (e.g. system administrators) should be responsible for
Plan for Backup performing the actual backup and recovery of data.
and Recovery

Module Summary • Storing

◦ Where do you plan to store database duplicates? In case of Online/Offsite storage
you can recover your systems in case of a natural disaster. Storing backups on-site
is essential to quick restore. But onsite storage has capacity bottlenecks and high
maintenance costs.
Source: [Link]

Database Management Systems Partha Pratim Das 54.21

Module Summary PPD

Module 54

Partha Pratim • Recovery based on operation logging supplements log-based recovery

Das
• Planning for Backup
Objectives &
Outline

Recovery with
Early Lock
Release
Operation Logging
Transaction Rollback
Failure Recovery
Recovery Algorithm

Plan for Backup

and Recovery

Module Summary
Slides used in this presentation are borrowed from [Link] with kind
permission of the authors.
Edited and new slides are marked with “PPD”.

Database Management Systems Partha Pratim Das 54.22

Module 55

Partha Pratim
Das

Objectives &
Outline Database Management Systems
RAID
Reliability via
Module 55: Backup & Recovery/5: Backup/2: RAID
Redundancy
Mirroring
Striping
Parity
RAID 0
RAID 1
Partha Pratim Das
RAID 2
RAID 3
Department of Computer Science and Engineering
RAID 4
RAID 5
Indian Institute of Technology, Kharagpur
RAID 6
Hybrid RAID ppd@[Link]
RAID 01
RAID 10
Choice of RAID
Comparison

Module Summary

Database Management Systems Partha Pratim Das 55.1

Module Recap PPD

Module 55

Partha Pratim • Recovery based on operation logging supplements log-based recovery

Das
• Planning for Backup
Objectives &
Outline

RAID
Reliability via
Redundancy
Mirroring
Striping
Parity
RAID 0
RAID 1
RAID 2
RAID 3
RAID 4
RAID 5
RAID 6
Hybrid RAID
RAID 01
RAID 10
Choice of RAID
Comparison

Module Summary

Database Management Systems Partha Pratim Das 55.2

Module Objectives PPD

Module 55

Partha Pratim • Understanding RAID: Array of redundant disks in parallel to enhance speed and
Das
reliability
Objectives &
Outline

RAID
Reliability via
Redundancy
Mirroring
Striping
Parity
RAID 0
RAID 1
RAID 2
RAID 3
RAID 4
RAID 5
RAID 6
Hybrid RAID
RAID 01
RAID 10
Choice of RAID
Comparison

Module Summary

Database Management Systems Partha Pratim Das 55.3

Module Outline PPD

Module 55

Partha Pratim • Redundant Array of Independent Disks: RAID

Das

Objectives &
Outline

RAID
Reliability via
Redundancy
Mirroring
Striping
Parity
RAID 0
RAID 1
RAID 2
RAID 3
RAID 4
RAID 5
RAID 6
Hybrid RAID
RAID 01
RAID 10
Choice of RAID
Comparison

Module Summary

Database Management Systems Partha Pratim Das 55.4

RAID: Redundant Array of Independent Disks PPD

Module 55

Partha Pratim
Das

Objectives &
Outline

RAID
Reliability via
Redundancy
Mirroring
Striping
Parity
RAID 0
RAID 1
RAID 2
RAID 3
RAID 4
RAID 5
RAID 6
Hybrid RAID
RAID: Redundant Array of Independent Disks
RAID 01
RAID 10
Choice of RAID
Comparison

Module Summary

Database Management Systems Partha Pratim Das 55.5

RAID: Redundant Array of Independent Disks

Module 55 • Disk organization techniques that manage a large numbers of disks, providing a view of
Partha Pratim a single disk of
Das
◦ high capacity and high speed by using multiple disks in parallel,
Objectives &
Outline ◦ high reliability by storing data redundantly, so that data can be recovered even if a
RAID disk fails
Reliability via
Redundancy
• The chance that some disk out of a set of n disks will fail is much higher than the
Mirroring
Striping chance that a specific single disk will fail
Parity
RAID 0 ◦ For example, a system with 100 disks, each with MTTF of 100,000 hours (approx.
RAID 1
RAID 2
11 years), will have a system MTTF of 1000 hours (approx. 41 days)
RAID 3 ◦ Techniques for using redundancy to avoid data loss are critical with large numbers
RAID 4
RAID 5 of disks
RAID 6
Hybrid RAID • Originally a cost-effective alternative to large, expensive disks
RAID 01
RAID 10
◦ “I” in RAID originally stood for inexpensive
Choice of RAID
Comparison
◦ Today RAIDs are used for their higher reliability and bandwidth
Module Summary ▷ The “I” is interpreted as independent
Database Management Systems Partha Pratim Das 55.6
Improvement of Reliability via Redundancy: Mirroring

Module 55 • Redundancy: Store extra information that can be used to rebuild information lost in a
Partha Pratim disk failure
Das
• Mean time to data loss depends on mean time to failure, and mean time to repair
Objectives &
Outline ◦ For example, MTTF of 100,000 hours, mean time to repair of 10 hours gives mean
RAID time to data loss of 500*106 hours (or 57,000 years) for a mirrored pair of disks
Reliability via
Redundancy (ignoring dependent failure modes)
Mirroring
Striping • Mirroring (or shadowing)
Parity
RAID 0
◦ Duplicate every disk. Logical disk consists of two physical disks.
RAID 1
RAID 2
◦ Every write is carried out on both disks
RAID 3 ▷ Reads can take place from either disk
RAID 4
RAID 5 ◦ If one disk in a pair fails, data still available in the other
RAID 6
Hybrid RAID ▷ Data loss would occur only if a disk fails, and its mirror disk also fails before the
RAID 01
RAID 10
system is repaired
Choice of RAID − Probability of combined event is very small
Comparison

Module Summary
− Except for dependent failure modes such as fire or building collapse or
electrical power surges
Database Management Systems Partha Pratim Das 55.7
Improvement of Reliability via Redundancy (2): Striping

Module 55

Partha Pratim • Bit-level Striping: Split the bits of each byte across multiple disks
Das
◦ In an array of eight disks, write bit i of each byte to disk i
Objectives &
Outline ◦ Each access can read data at eight times the rate of a single disk
RAID ◦ But seek/access time worse than for a single disk
Reliability via
Redundancy ▷ Bit level striping is not used much any more
Mirroring
Striping • Byte-level Striping: Each file is split up into parts one byte in size. Using n = 4 disk
Parity
RAID 0 array as an example
◦ the 1st byte would be written to the 1st drive
RAID 1
RAID 2
RAID 3
RAID 4
◦ the 2nd byte to the 2nd drive and so on, until
RAID 5 ◦ the 5th byte is then written to the 1st drive again and the whole process starts over
RAID 6
Hybrid RAID
◦ the i th byte is then written to the (((i − 1) mod n) + 1)th drive
RAID 01
RAID 10
• Block-level Striping: With n disks, block i of a file goes to disk (i mod n) + 1
Choice of RAID
Comparison
◦ Requests for different blocks can run in parallel if the blocks reside on different disks
Module Summary ◦ A request for a long sequence of blocks can utilize all disks in parallel
Database Management Systems Partha Pratim Das 55.8
Improvement of Reliability via Redundancy (3): Parity

Module 55

Partha Pratim • Bit-Interleaved Parity: A single parity bit is enough for error correction, not just
Das
detection, since we know which disk has failed
Objectives &
Outline ◦ When writing data, corresponding parity bits must also be computed and written to
RAID a parity bit disk
Reliability via
Redundancy ◦ To recover data in a damaged disk, compute XOR of bits from other disks
Mirroring
Striping
(including parity bit disk)
Parity
RAID 0
• Block-Interleaved Parity: Uses block-level striping, and keeps a parity block on a
RAID 1 separate disk for corresponding blocks from n other disks
RAID 2
RAID 3 ◦ When writing data block, corresponding block of parity bits must also be computed
RAID 4
RAID 5
and written to parity disk
RAID 6
Hybrid RAID
◦ To find value of a damaged block, compute XOR of bits from corresponding blocks
RAID 01 (including parity block) from other disks
RAID 10
Choice of RAID
Comparison

Module Summary

Database Management Systems Partha Pratim Das 55.9

Standard RAID Levels

Module 55 • A basic set of RAID configurations that employ the techniques of striping, mirroring, or
Partha Pratim
parity to create large reliable data stores from multiple general-purpose HDDs
Das
• The most common types are RAID 0 (striping), RAID 1 (mirroring) and its variants,
Objectives &
Outline
RAID 5 (distributed parity), and RAID 6 (dual parity)
RAID • Multiple RAID levels can also be combined or nested, for instance RAID 10 (striping of
Reliability via
Redundancy mirrors) or RAID 01 (mirroring stripe sets)
Mirroring
Striping • RAID levels are standardized by the Storage Networking Industry Association (SNIA) in
Parity
RAID 0 the Common RAID Disk Drive Format (DDF) standard
RAID 1
RAID 2 • The numerical values only serve as identifiers and do not signify any metric
RAID 3
RAID 4 • While most RAID levels can provide good protection against and recovery from
RAID 5
RAID 6
hardware defects or defective sectors/read errors (hard errors), they do not provide any
Hybrid RAID protection against data loss due to catastrophic failures (fire, water) or soft errors such
RAID 01
RAID 10 as user error, software malfunction, or malware infection
Choice of RAID
Comparison • For valuable data, RAID is only one building block of a larger data loss prevention and
Module Summary recovery scheme – it cannot replace a backup plan
Source: Standard RAID levels (Accessed 24-Aug-2021)
Database Management Systems Partha Pratim Das 55.10
RAID 0: Striping

Module 55
• RAID level-0 only uses data striping, no redundant infor-
Partha Pratim
Das mation is maintained
Objectives & • If one disk fails, then all data in the disk array is lost
Outline

RAID
• Independent of the number of data disks, the effective
Reliability via
Redundancy
space utilization for a RAID Level-0 system is always 100
Mirroring percent
Striping
Parity • RAID Level-0 has the best write performance of all RAID
RAID 0
RAID 1
levels because the absence of redundant information im-
RAID 2
RAID 3
plies that no redundant information needs to be updated.
RAID 4
RAID 5
• This solution is the least costly
Image source: Standard RAID levels
RAID 6
Hybrid RAID
• Reliability is very poor (Accessed 19-Aug-2021)

RAID 01
RAID 10 Source: Database Management Systems by Raghu Ramakrishnan and Johannes Gehrke
Choice of RAID
Comparison

Module Summary

Database Management Systems Partha Pratim Das 55.11

RAID 1: Mirroring

Module 55
• RAID 1 employs mirroring, maintaining two identical
Partha Pratim
Das copies of the data on two different disks
Objectives & • It is the most expensive solution
Outline

RAID
• It provides excellent fault tolerance
Reliability via
Redundancy • Every write of a disk block involves a write on both disks
Mirroring
Striping • With two copies of each block exist on different disks,
Parity
RAID 0
we can distribute reads between the two disks and allow
RAID 1 parallel reads
RAID 2
RAID 3 • RAID Level-1 does not stripe the data over different
RAID 4
RAID 5 disks. Thus the transfer rate for a single request is com-
RAID 6
Hybrid RAID
parable to the transfer rate of a single disk
Image source: Standard RAID levels
RAID 01
RAID 10
• The effective space utilization is 50 percent, independent (Accessed 19-Aug-2021)

Choice of RAID of the number of data disks

Comparison

Module Summary Source: Database Management Systems by Raghu Ramakrishnan and Johannes Gehrke

Database Management Systems Partha Pratim Das 55.12

RAID 2: Parity

Module 55
• RAID 2 uses designated drive for parity
Partha Pratim
Das • In RAID 2, the striping unit is a single bit
Objectives &
Outline
• Hamming Code is used for parity
RAID
◦ Hamming codes can detect up to two-bit er-
Reliability via
rors or correct one-bit errors
Redundancy
◦ For a 4-bit data, 3 bits are added
Mirroring
Striping
◦ Simple parity code cannot correct errors, and
Parity can detect only an odd number of bits in error Image source: Standard RAID levels
RAID 0 (Accessed 19-Aug-2021)
RAID 1
RAID 2
• In a disk array with D data disks, the smallest unit of transfer for a read is a set of D
RAID 3 blocks. It is so because each bit of the data is stored in different blocks of D disks
RAID 4
RAID 5
subsequently (Bit-level striping)
RAID 6
Hybrid RAID • Writing a block involves reading D blocks into main memory, modifying D + C blocks,
RAID 01
RAID 10
and writing D + C blocks to disk, where C is the number of check disks. This sequence
Choice of RAID of steps is called a read-modify-write cycle
Comparison
Source: Database Management Systems by Raghu Ramakrishnan and Johannes Gehrke
Module Summary

Database Management Systems Partha Pratim Das 55.13

RAID 3: Byte Striping + Parity

Module 55

Partha Pratim
• RAID 3 has a single check disk with parity
Das information. Thus, the reliability overhead
Objectives & for RAID 3 is a single disk, the lowest over-
Outline
head possible
RAID
Reliability via
Redundancy
• RAID 3 consists of byte-level striping with
Mirroring dedicated parity. Therefore the data trans-
Striping
Parity fer rate of this level is high because data
RAID 0
RAID 1
can be accessed in parallel Image source: Standard RAID levels
(Accessed 19-Aug-2021)
RAID 2
RAID 3 • RAID-3 cannot service multiple requests simultaneously: This is so because any single
RAID 4
RAID 5
block of data will be spread across all members of the set and will reside in the same
RAID 6 physical location on each disk and thus every single I/O request has to be addressed by
Hybrid RAID
RAID 01 working on every disk in the array
RAID 10
Choice of RAID
Source: Database Management Systems by Raghu Ramakrishnan and Johannes Gehrke
Comparison

Module Summary

Database Management Systems Partha Pratim Das 55.14

RAID 4: Block Striping + Parity

Module 55

Partha Pratim
Das
• RAID 4 has a striping unit of a disk block
instead of a single bit, as in RAID 3
Objectives &
Outline
• Read requests of the size of a disk block
RAID
Reliability via
can be served entirely by the disk where
Redundancy
Mirroring
the requested block resides therefore RAID
Striping 4 provides good performance for data reads
Parity Image source: Standard RAID levels
RAID 0 (Accessed 19-Aug-2021)
RAID 1
RAID 2
• Provides recovery of corrupted or lost data using XOR recovery mechanism
RAID 3
RAID 4 • If a disk experiences a failure, recovery can be made by simply XORing all the
RAID 5
RAID 6
remaining data bits and the parity bit
Hybrid RAID
RAID 01
• Facilitates recovery of at most 1 disk failure. At this level, if more than one disk fails,
RAID 10 then there is no way to recover the data
Choice of RAID
Comparison • Write performance is low due to the need to write all parity data to a single disk
Module Summary
Source: Database Management Systems by Raghu Ramakrishnan and Johannes Gehrke
Database Management Systems Partha Pratim Das 55.15
RAID 5: Distributed Parity

Module 55

Partha Pratim
Das
• RAID 5 improves upon RAID 4 by dis-
tributing the parity blocks uniformly over
Objectives &
Outline all disks instead of storing them on a sin-
RAID gle check disk
Reliability via
Redundancy
Mirroring
• Several write requests can potentially be
Striping processed in parallel since the bottleneck
Parity
RAID 0
of a unique check disk has been eliminated
Image source: Standard RAID levels
RAID 1 (Accessed 19-Aug-2021)
RAID 2
RAID 3 • Read requests have a higher level of parallelism. Since the data is distributed over all
RAID 4
RAID 5
disks, read requests involve all disks, whereas, in systems with a dedicated check disk,
RAID 6 the check disk never participates in reads
Hybrid RAID
RAID 01 • This level too allows recovery of only 1 disk failure like level 4
RAID 10
Choice of RAID
Comparison Source: Database Management Systems by Raghu Ramakrishnan and Johannes Gehrke

Module Summary

Database Management Systems Partha Pratim Das 55.16

RAID 6: Dual Parity

Module 55
• RAID 6 extends RAID 5 by adding an-
Partha Pratim
Das other parity block, thus it uses block-level
Objectives &
striping with two parity blocks distributed
Outline across all member disks
RAID
Reliability via
• Write performance of RAID 6 is poorer
Redundancy
Mirroring
than RAID 5 because of the increased com-
Striping plexity of parity calculation
Parity
RAID 0
• RAID 6 use Reed-Solomon Codes to re-
RAID 1
RAID 2 cover from up to two simultaneous disk Image source: Standard RAID levels
(Accessed 19-Aug-2021)
RAID 3
RAID 4
failures. Therefore it can handle a disk fail-
RAID 5 ure during recovery of a failed disk
RAID 6
Hybrid RAID Source: Database Management Systems by Raghu Ramakrishnan and Johannes Gehrke, Standard RAID levels
RAID 01
RAID 10
Choice of RAID
Comparison

Module Summary

Database Management Systems Partha Pratim Das 55.17

Hybrid RAID: Nested RAID levels

Module 55
• Nested RAID levels (Hybrid RAID), combine two or more of the standard RAID levels
Partha Pratim
Das to gain performance, additional redundancy or both, as a result of combining properties
Objectives &
of different standard RAID layouts.
Outline
• Nested RAID levels are usually numbered using a series of numbers
RAID
Reliability via
Redundancy
◦ The first number in the numeric designation denotes the lowest RAID level in the
Mirroring ”stack”, while
Striping
Parity
◦ the rightmost one denotes the highest layered RAID level
RAID 0
RAID 1 • For example, RAID 50 layers the data striping of RAID 0 on top of the distributed
RAID 2
RAID 3
parity of RAID 5
RAID 4
RAID 5
• Nested RAID levels include RAID 01, RAID 10, RAID 100, RAID 50 and RAID 60,
RAID 6 which all combine data striping with other RAID techniques
Hybrid RAID
RAID 01 • As a result of the layering scheme, RAID 01 and RAID 10 represent significantly
RAID 10
Choice of RAID different nested RAID levels
Comparison

Module Summary Source: Nested RAID levels (Accessed 23-Aug-2021)

Database Management Systems Partha Pratim Das 55.18

RAID 01 (RAID 0+1): Mirror of Stripes

Module 55
• RAID 01 is a mirror of stripes
Partha Pratim
Das • It achieves both replication and sharing of data between
Objectives & disks
Outline

RAID
• The usable capacity of a RAID 01 array is the same as
Reliability via
Redundancy
in a RAID 1 array made of the same drives, in which
Mirroring one half of the drives is used to mirror the other half:
Striping
Parity
(N/2) · Smin , where N is the total number of drives and
RAID 0
Smin is the capacity of the smallest drive in the array
RAID 1 Image source: Nested RAID levels
RAID 2
RAID 3
• At least four disks are required in a standard RAID 01 (Accessed 23-Aug-2021)

RAID 4 configuration, but larger arrays are also used

RAID 5
RAID 6 Source: Nested RAID levels (Accessed 23-Aug-2021)
Hybrid RAID
RAID 01
RAID 10
Choice of RAID
Comparison

Module Summary

Database Management Systems Partha Pratim Das 55.19

RAID 10 (RAID 1+0): Stripe of Mirrors

Module 55
• RAID 10 is a stripe of mirrors
Partha Pratim
Das • RAID 10 is a RAID 0 array of mirrors, which may be
Objectives & two- or three-way mirrors, and requires a minimum of
Outline
four drives
RAID
Reliability via
Redundancy
• RAID 10 provides better throughput and latency than all
Mirroring other RAID levels except RAID 0 (which wins in through-
Striping
Parity
put)
RAID 0
RAID 1
• Thus, it is the preferable RAID level for I/O-intensive
RAID 2
RAID 3
applications such as database, email, and web servers, as Image source: Nested RAID levels
(Accessed 23-Aug-2021)
RAID 4 well as for any other use requiring high disk performance
RAID 5
RAID 6 Source: Nested RAID levels (Accessed 23-Aug-2021)
Hybrid RAID
RAID 01
RAID 10
Choice of RAID
Comparison

Module Summary

Database Management Systems Partha Pratim Das 55.20

Choice of RAID Levels

Module 55
• Different RAID Levels have different speed and fault tol-
Partha Pratim
Das erance properties
Objectives & • RAID level 0 is not fault tolerant
Outline

RAID
• Levels 1, 1E, 5, 50, 6, 60, and 1+0 are fault tolerant to
Reliability via
Redundancy
a different degree - should one of the hard drives in the
Mirroring array fail, the data is still reconstructed on the fly and
Striping
Parity
no access interruption occurs
RAID 0
RAID 1
• RAID levels 2, 3, and 4 are theoretically defined but not
RAID 2
RAID 3
used in practice
RAID 4
RAID 5
• There are some more complex layouts like RAID 5E/5EE Image source: RAID Calculator
RAID 6 (integrating some spare space) and RAID DP (Accessed 23-Aug-2021)
Hybrid RAID
RAID 01 ◦ “E” often stands for “Enhanced” or “Extended”
RAID 10
Choice of RAID
◦ Some of them use hot spare drives
Comparison

Module Summary

Database Management Systems Partha Pratim Das 55.21

Choice of RAID Levels (2)

Module 55

Partha Pratim • Factors in choosing RAID level

Das
◦ Monetary cost
Objectives &
Outline ◦ Performance: Number of I/O operations per second, and bandwidth during normal
RAID operation
Reliability via
Redundancy ◦ Performance during failure
Mirroring
Striping
◦ Performance during rebuild of failed disk
Parity
RAID 0
▷ Including time taken to rebuild failed disk
RAID 1
RAID 2
• RAID 0 is used only when data safety is not important
RAID 3
RAID 4
◦ For example, data can be recovered quickly from other sources
RAID 5
RAID 6
• Level 2 and 4 never used since they are subsumed by 3 and 5
Hybrid RAID
RAID 01
• Level 3 is not used anymore since bit-striping forces single block reads to access all
RAID 10 disks, wasting disk arm movement, which block striping (Level 5) avoids
Choice of RAID
Comparison • Level 6 is rarely used since levels 1 and 5 offer adequate safety for most applications
Module Summary

Database Management Systems Partha Pratim Das 55.22

Choice of RAID Levels (3)

Module 55

Partha Pratim • Level 1 provides much better write performance than level 5
Das
◦ Level 5 requires at least 2 block reads and 2 block writes to write a single block,
Objectives &
Outline whereas Level 1 only requires 2 block writes
RAID ◦ Level 1 preferred for high update environments such as log disks
Reliability via
Redundancy • Level 1 had higher storage cost than level 5
Mirroring
Striping ◦ disk drive capacities increasing rapidly (50%/year) whereas disk access times have
Parity
RAID 0 decreased much less (x 3 in 10 years)
RAID 1
RAID 2
◦ I/O requirements have increased greatly, e.g. for Web servers
RAID 3 ◦ When enough disks have been bought to satisfy required rate of I/O, they often
RAID 4
RAID 5
have spare storage capacity
RAID 6
Hybrid RAID
▷ so there is often no extra monetary cost for Level 1!
RAID 01
RAID 10
• Level 5 is preferred for applications with low update rate, and large amounts of data
Choice of RAID
Comparison
• Level 1 is preferred for all other applications
Module Summary

Database Management Systems Partha Pratim Das 55.23

Comparison of RAID: Theoretical PPD

Module 55 Level Description Min.[b] # Space Efficiency Fault Performance

of drives Tolerance Read Write
Partha Pratim (Drives) (as factor of single disk)
Das
RAID 0 Block-level striping without 2 1 None n n
Objectives & parity or mirroring
Outline RAID 1 Mirroring without parity or 2 1
n n−1 n[a] 1[c]
RAID striping
Reliability via RAID 2 Bit-level striping with 3 1− 1
n lg(n + 1) One[d] Depends Depends
Redundancy
Mirroring
Hamming code for error
Striping
correction
Parity RAID 3 Byte-level striping with 3 1− 1
n One n−1 n − 1[e]
RAID 0 dedicated parity
RAID 1
RAID 4 Block-level striping with 3 1− 1
n One n−1 n − 1[e]
RAID 2
dedicated parity
RAID 3
RAID 4 RAID 5 Block-level striping with 3 1− 1
n One n[e] single sector: 14
RAID 5 distributed parity full stripe: n − 1[e]
RAID 6
RAID 6 Block-level striping with 4 1− 2
n Two n[e] single sector: 16
Hybrid RAID
RAID 01
double distributed parity full stripe: n − 2[e]
RAID 10 [a] Theoretical maximum, as low as single-disk performance in practice
Choice of RAID [b] Assumes a non-degenerate minimum number of drives
[c] If disks with different speeds are used in a RAID 1 array, overall write performance is equal to the speed of the slowest disk
Comparison
[d] RAID 2 can recover from one drive failure or repair corrupt data or parity when a corrupted bit’s corresponding data and parity are good
Module Summary [e] Assumes hardware capable of performing associated calculations fast enough
Source: Standard RAID levels (Accessed 23-Aug-2021)
Database Management Systems Partha Pratim Das 55.24
Comparison of RAID: Practical PPD

Module 55 Features RAID 0 RAID 1 RAID 5 RAID 6 RAID 10

Partha Pratim Minimum # of drives 2 2 3 4 4
Das Fault tolerance None Single-drive Single-drive Two-drive Up to 1
Objectives &
failure failure failure disk failure
Outline in each
RAID sub-array
Reliability via
Redundancy
Read performance High Medium Low Low High
Mirroring Write Performance High Medium Low Low Medium
Striping Capacity utilization 100% 50% 67% – 94% 50% – 88% 50%
Parity
Typical applications
RAID 0
High end Operating Data Data archive, Fast
RAID 1
RAID 2
workstations, systems, warehouse, backup to databases,
RAID 3 data logging, transaction web servers, disk, high file servers,
RAID 4 real-time databases archiving availability application
RAID 5
RAID 6
rendering, solutions, servers
Hybrid RAID very servers with
RAID 01 transitory large capacity
RAID 10
data requirements
Choice of RAID
Comparison
Source: RAID Level Comparison: RAID 0, RAID 1, RAID 5, RAID 6 and RAID 10 (Accessed 23-Aug-2021)

Module Summary

Database Management Systems Partha Pratim Das 55.25

What Does RAID Not Do? PPD

Module 55 • RAID does not equate to 100% uptime: Nothing can. RAID is another tool on in
Partha Pratim the toolbox meant to help minimize downtime and availability issues. There is still a
Das
risk of a RAID card failure, though that is significantly lower than a HDD failure
Objectives &
Outline • RAID does not replace backups: Nothing can replace a well planned and frequently
RAID tested backup implementation!
Reliability via
Redundancy
Mirroring
• RAID does not protect against data corruption, human error, or security issues:
Striping While it can protect you against a drive failure, there are innumerable reasons for
Parity
RAID 0
keeping backups. So RAID is not a replacement for backups
RAID 1
RAID 2 • RAID does not necessarily allow to dynamically increase the size of the array: If
RAID 3
RAID 4
you need more disk space, you cannot simply add another drive to the array. You are
RAID 5 likely going to have to start from scratch, rebuilding/reformatting the array. Luckily,
RAID 6
Hybrid RAID
Steadfast engineers are here to help you architect and execute whatever systems you
RAID 01
need to keep your business running.
RAID 10
Choice of RAID
Comparison
• RAID isn’t always the best option for virtualization and high-availability failover:
Module Summary
You will want to look at SAN solutions
Source: (Almost) Everything You Need to Know About RAID
Database Management Systems Partha Pratim Das 55.26
Module Summary

Module 55

Partha Pratim • Understood RAID - array of redundant disks in parallel to enhance speed and reliability
Das

Objectives &
Outline

RAID
Reliability via
Slides used in this presentation are borrowed from [Link] with kind
Redundancy
Mirroring
permission of the authors.
Striping Edited and new slides are marked with “PPD”.
Parity
RAID 0
RAID 1
RAID 2
RAID 3
RAID 4
RAID 5
RAID 6
Hybrid RAID
RAID 01
RAID 10
Choice of RAID
Comparison

Module Summary

Database Management Systems Partha Pratim Das 55.27

Module 56

Partha Pratim
Das

Week Recap

Objectives &
Database Management Systems
Outline
Module 56: Query Processing and Optimization/1: Processing
Query Processing

Query Cost

Selection
Operation
Complex Selections Partha Pratim Das
Sorting
External Sort-Merge
Department of Computer Science and Engineering
Join Operation Indian Institute of Technology, Kharagpur
Other Operations
ppd@[Link]
Module Summary

Database Management Systems Partha Pratim Das 56.1

Week Recap PPD

Module 56

Partha Pratim • Learnt the importance of backup an analysed different backup strategies
Das
• Failures may be due to variety of sources – each needs a strategy for handling
Week Recap

Objectives &
• A proper mix and management of volatile, non-volatile and stable storage can
Outline
guarantee recovery from failures and ensure Atomicity, Consistency and Durability
Query Processing

Query Cost
• Log-based recovery is efficient and effective
Selection • Learnt how Hot backup of transaction log helps in recovering consistent database.
Operation
Complex Selections
• Studied the recovery algorithms for concurrent transactions
Sorting
External Sort-Merge • Recovery based on operation logging supplements log-based recovery
Join Operation
• Planning for Backup
Other Operations

Module Summary
• Understood RAID - array of redundant disks in parallel to enhance speed and reliability

Database Management Systems Partha Pratim Das 56.2

Module Objectives PPD

Module 56

Partha Pratim • To understand the overall flow for Query Processing

Das
• To define the Measures of Query Cost
Week Recap

Objectives &
• To understand the algorithms for processing Selection Operations, Sorting, Join
Outline
Operations, and a few Other Operations
Query Processing

Query Cost

Selection
Operation
Complex Selections

Sorting
External Sort-Merge

Join Operation

Other Operations

Module Summary

Database Management Systems Partha Pratim Das 56.3

Module Outline PPD

Module 56

Partha Pratim • Overview of Query Processing

Das
• Measures of Query Cost
Week Recap

Objectives &
• Selection Operation
Outline

Query Processing
• Sorting
Query Cost • Join Operation
Selection
Operation • Other Operations
Complex Selections

Sorting
External Sort-Merge

Join Operation

Other Operations

Module Summary

Database Management Systems Partha Pratim Das 56.4

Overview of Query Processing PPD

Module 56

Partha Pratim
Das

Week Recap

Objectives &
Outline

Query Processing

Query Cost

Selection
Operation
Complex Selections

Sorting
External Sort-Merge

Join Operation
Overview of Query Processing
Other Operations

Module Summary

Database Management Systems Partha Pratim Das 56.5

Basic Steps in Query Processing

Module 56
a) Parsing and translation
Partha Pratim
Das b) Optimization
Week Recap c) Evaluation
Objectives &
Outline

Query Processing

Query Cost

Selection
Operation
Complex Selections

Sorting
External Sort-Merge

Join Operation

Other Operations

Module Summary

Database Management Systems Partha Pratim Das 56.6

Basic Steps in Query Processing (2)

Module 56
• Parsing and translation
Partha Pratim
Das ◦ translate the query into its internal form
Week Recap . This is then translated into relational algebra
Objectives &
Outline
◦ Parser checks syntax, verifies relations
Query Processing • Evaluation
Query Cost
◦ The query-execution engine takes a query-evaluation plan, executes that plan, and
Selection
Operation returns the answers to the query
Complex Selections

Sorting
External Sort-Merge

Join Operation

Other Operations

Module Summary

Database Management Systems Partha Pratim Das 56.7

Basic Steps in Query Processing (3): Optimization

Module 56
• Consider the query
Partha Pratim
Das select salary
Week Recap
from instructor
Objectives & where salary < 75000;
Outline

Query Processing
which can be translated into either of the following relational-algebra expressions:
Query Cost ◦ σsalary <75000 (Πsalary (instructor ))
Selection ◦ Πsalary (σsalary <75000 (instructor ))
Operation
Complex Selections • Each relational algebra operation can be evaluated using one of several different
Sorting algorithms
External Sort-Merge

Join Operation ◦ Correspondingly, a relational-algebra expression can be evaluated in many ways

Other Operations • Annotated expression specifying detailed evaluation strategy is called an
Module Summary
evaluation-plan.
◦ For example, can use an index on salary to find instructors with salary < 75000,
◦ or can perform complete relation scan and discard instructors with salary ≥ 75000
Database Management Systems Partha Pratim Das 56.8
Basic Steps in Query Processing (4): Optimization

Module 56

Partha Pratim • Query Optimization: Amongst all equivalent evaluation plans choose the one with
Das
lowest cost
Week Recap
◦ Cost is estimated using statistical information from the database catalog
Objectives &
Outline . For example, number of tuples in each relation, size of tuples, etc.
Query Processing
• In this module we study
Query Cost

Selection ◦ How to measure query costs

Operation
Complex Selections
◦ Algorithms for evaluating relational algebra operations
Sorting
◦ How to combine algorithms for individual operations in order to evaluate a complete
External Sort-Merge expression
Join Operation

Other Operations
• In the next module
Module Summary ◦ We study how to optimize queries, that is, how to find an evaluation plan with
lowest estimated cost

Database Management Systems Partha Pratim Das 56.9

Measures of Query Cost PPD

Module 56

Partha Pratim
Das

Week Recap

Objectives &
Outline

Query Processing

Query Cost

Selection
Operation
Complex Selections

Sorting
External Sort-Merge

Query Processing

Query Cost

Selection
Operation
Complex Selections

Sorting
External Sort-Merge

Join Operation
Selection Operation
Other Operations

Module Summary

Database Management Systems Partha Pratim Das 56.14

Selection Operation: File / Index Scan

Module 56

Partha Pratim
A# Algorithm Cost Reason
Das A1 Linear Search tS + br × tT One initial seek plus br block transfers
A1 Linear Search, Average case Since at most one record satisfies condition, scan can be terminated as
Week Recap
Eq. on Key tS + (br /2) × tT soon as the required record is found. br blocks transfers in worst case
Objectives & A2 Prm. Index, (hi +1)×(tT +tS ) Index lookup traverses the height of the tree plus one I/O to fetch the
Outline
Eq. on Key record; each of these I/O operations requires a seek and a block transfer
Query Processing A3 Prm. Index, hi × (tT + tS )+ One seek for each level of the tree, one seek for the first block. Here
Query Cost
Eq. on Nonkey b × tT all of b are read. These blocks are leaf blocks assumed to be stored
sequentially (for a primary index) and don’t require additional seeks
Selection
Operation
A4 Snd. Index, (hi +1)×(tT +tS ) This case is similar to primary index
Complex Selections
Eq. on Key
A4 Snd. Index, (hi +n)×(tT +tS ) Here, cost of index traversal is the same as for A3, but each record may
Sorting
Eq. on Nonkey be on a different block, requiring a seek per record. Cost is potentially
External Sort-Merge
very high if n is large
Join Operation A5 Prm. Index, hi × (tT + tS )+ Identical to the case of A3, equality on nonkey
Other Operations Comparison b × tT
A6 Snd. Index, (hi +n)×(tT +tS ) Identical to the case of A4, equality on nonkey
Module Summary
Comparison
tT is time to transfer one block. tS is time for one seek
br denotes the number of blocks in the file
b denotes the number of blocks containing records with the specified search key
hi denotes the height of the index. n is the number of records fetched
Database Management Systems Partha Pratim Das 56.15
Complex Selections: Conjunction

Module 56
• Conjunction: σθ1 ∧θ2 ∧...θn (r)
Partha Pratim
Das • A7 (conjunctive selection using one index)
Week Recap ◦ Select a combination of θi and algorithms A1 through A6 that results in the least
Objectives &
Outline
cost for σθi (r)
Query Processing
◦ Test other conditions on tuple after fetching it into memory buffer
Query Cost • A8 (conjunctive selection using composite index)
Selection
Operation ◦ Use appropriate composite (multiple-key) index if available
Complex Selections
• A9 (conjunctive selection by intersection of identifiers)
Sorting
External Sort-Merge ◦ Requires indices with record pointers
Join Operation ◦ Use corresponding index for each condition, and take intersection of all the obtained
Other Operations
sets of record pointers
Module Summary
◦ Then fetch records from file
◦ If some conditions do not have appropriate indices, apply test in memory

Database Management Systems Partha Pratim Das 56.16

Complex Selections: Disjunction

Module 56

Partha Pratim • Disjunction: σθ1 ∨θ2 ∨...θn (r).

Das
• A10 (disjunctive selection by union of identifiers)
Week Recap

Objectives &
◦ Applicable if all conditions have available indices
Outline
. Otherwise use linear scan
Query Processing

Query Cost
◦ Use corresponding index for each condition, and take union of all the obtained sets
Selection of record pointers
Operation
Complex Selections
◦ Then fetch records from file
Sorting • Negation: σ¬θ (r)
External Sort-Merge

Join Operation
◦ Use linear scan on file
Other Operations
◦ If very few records satisfy ¬θ, and an index is applicable to θ
Module Summary . Find satisfying records using index and fetch from file

Database Management Systems Partha Pratim Das 56.17

Sorting PPD

Module 56

Partha Pratim
Das

Week Recap

Objectives &
Outline

Query Processing

Query Cost

Join Operation

Module 56

Partha Pratim • Several different algorithms to implement joins

Das
◦ Nested-loop join
Week Recap
◦ Block nested-loop join
Objectives &
Outline ◦ Indexed nested-loop join
Query Processing ◦ Merge-join
Query Cost ◦ Hash-join
Selection
Operation • Choice based on cost estimate
Complex Selections

Sorting
• Examples use the following information
External Sort-Merge
◦ Number of records of student: nstudents = 5,000
Join Operation
◦ Number of records of takes: ntakes = 10,000
Other Operations
◦ Number of blocks of student: bstudents = 100
Module Summary
◦ Number of blocks of takes: btakes = 400

Database Management Systems Partha Pratim Das 56.23

Nested-Loop Join

Module 56
• To compute the theta join r ./θ s
Partha Pratim
Das for each tuple tr in r do begin
Week Recap
for each tuple ts in s do begin
Objectives &
test pair (tr , ts ) to see if they satisfy the join condition θ
Outline if they do, add tr • ts to the result.
Query Processing
end
Query Cost
end
Selection
Operation
• r is called the outer relation and s the inner relation of the join
Complex Selections

Sorting • Requires no indices and can be used with any kind of join condition
External Sort-Merge

Join Operation
• Expensive since it examines every pair of tuples in the two relations
Other Operations

Module Summary

Database Management Systems Partha Pratim Das 56.24

Nested-Loop Join (2)

Module 56 • In the worst case, if there is enough memory only to hold one block of each relation, the estimated cost is
Partha Pratim nr ∗ bs + br block transfers, plus nr + br seeks, where nr (ns ) denotes the number of tuples in r (s) and
Das br (bs ) denotes the number of blocks containing tuples of in r (s)
Week Recap • If the smaller relation fits entirely in memory, use that as the inner relation.
Objectives &
Outline ◦ Reduces cost to br + bs block transfers and 2 seeks
Query Processing • Example of join of students and takes: nstudents = 5,000, ntakes = 10,000, bstudents = 100, btakes = 400
Query Cost
• Assuming worst case memory availability cost estimate is
Selection
Operation ◦ with student as outer relation:
Complex Selections
. 5000 * 400 + 100 = 2,000,100 block transfers,
Sorting . 5000 + 100 = 5100 seeks
External Sort-Merge

Join Operation
◦ with takes as the outer relation
Other Operations
. 10000 * 100 + 400 = 1,000,400 block transfers and 10,400 seeks
Module Summary • If smaller relation (student) fits entirely in memory, the cost estimate will be 500 block transfers
• Block nested-loops algorithm is preferable

Database Management Systems Partha Pratim Das 56.25

Block Nested-Loop Join

Module 56
• Variant of nested-loop join in which every block of inner relation is paired with every
Partha Pratim
Das block of outer relation
Week Recap
for each block Br of r do begin
Objectives &
for each block Bs of s do begin
Outline for each tuple tr in Br do begin
Query Processing
for each tuple ts in Bs do begin
Query Cost
Check if (tr , ts ) satisfy the join condition
if they do, add tr • ts to the result.
Selection
Operation
Complex Selections
end
Sorting
External Sort-Merge
end
Join Operation end
Other Operations end
Module Summary

Database Management Systems Partha Pratim Das 56.26

Block Nested-Loop Join (2)

Module 56 • Worst case estimate: br ∗ bs + br block transfers + 2 * br seeks

Partha Pratim
Das
◦ Each block in the inner relation s is read once for each block in the outer relation
Week Recap
• Best case: br + bs block transfers + 2 seeks.
Objectives & • Improvements to nested loop and block nested loop algorithms:
Outline
◦ In block nested-loop, use M − 2 disk blocks as blocking unit for outer relations, where M = memory
Query Processing
size in blocks; use remaining two blocks to buffer inner relation and output
Query Cost
. Cost = dbr /(M − 2)e ∗ bs + br block transfers +2 ∗ dbr /(M − 2)e seeks
Selection
Operation ◦ If equi-join attribute forms a key or inner relation, stop inner loop on first match
Complex Selections ◦ Scan inner loop forward and backward alternately, to make use of the blocks remaining in buffer
Sorting (with LRU replacement)
External Sort-Merge
◦ Use index on inner relation, if available
Join Operation

Other Operations

Module Summary

Database Management Systems Partha Pratim Das 56.27

Indexed Nested-Loop Join

Module 56 • Index lookups can replace file scans if

Partha Pratim
Das
◦ join is an equi-join or natural join and
◦ an index is available on the inner relation’s join attribute
Week Recap
. Can construct an index just to compute a join.
Objectives &
Outline • For each tuple tr in the outer relation r, use the index to look up tuples in s that satisfy the join
Query Processing condition with tuple tr .
Query Cost • Worst case: buffer has space for only one page of r, and, for each tuple in r, we perform an index
Selection lookup on s.
Operation
Complex Selections • Cost of the join: br × (tT + tS ) + nr ∗ c
Sorting ◦ Where c is the cost of traversing index and fetching all matching s tuples for one tuple or r
External Sort-Merge
◦ c can be estimated as cost of a single selection on s using the join condition.
Join Operation

Other Operations
• If indices are available on join attributes of both r and s, use the relation with fewer tuples as the outer
relation.
Module Summary

Database Management Systems Partha Pratim Das 56.28

Example of Nested-Loop Join Costs

Module 56
• Compute student o
n takes, with student as the outer relation.
Partha Pratim
Das • Let takes have a primary B + -tree index on the attribute ID, which contains 20 entries
Week Recap in each index node.
Objectives &
Outline
• Since takes has 10,000 tuples, the height of the tree is 4, and one more access is
Query Processing
needed to find the actual data
Query Cost • student has 5000 tuples
Selection
Operation • Cost of block nested loops join
Complex Selections ◦ 400*100 + 100 = 40,100 block transfers + 2 * 100 = 200 seeks
Sorting . assuming worst case memory
External Sort-Merge . may be significantly less with more memory
Join Operation

Other Operations
• Cost of indexed nested loops join
◦ 100 + 5000 * 5 = 25,100 block transfers and seeks.
Module Summary ◦ CPU cost likely to be less than that for block nested loops join

Database Management Systems Partha Pratim Das 56.29

Other Operations PPD

Module 56

Partha Pratim
Das

Week Recap

Objectives &
Outline

Query Processing

Query Cost

Selection
Operation
Complex Selections

Sorting
External Sort-Merge

Join Operation
Other Operations
Other Operations

Module Summary

Database Management Systems Partha Pratim Das 56.30

Other Operations

Module 56
• Duplicate Elimination
Partha Pratim
Das • Projection
Week Recap • Aggregation
Objectives &
Outline • Set Operations
Query Processing
• Outer Join
Query Cost

Selection
Operation
Complex Selections

Sorting
External Sort-Merge

Join Operation

Other Operations

Module Summary

Database Management Systems Partha Pratim Das 56.31

Other Operations: Duplicate Elimination & Projection

Module 56
• Duplicate Elimination can be implemented via hashing or sorting
Partha Pratim
Das ◦ On sorting duplicates will come adjacent to each other, and all but one set of
Week Recap
duplicates can be deleted
Objectives & ◦ Optimization: duplicates can be deleted during run generation as well as at
Outline
intermediate merge steps in external sort-merge
Query Processing
◦ Hashing is similar – duplicates will come into the same bucket
Query Cost

Selection • Projection :
Operation
Complex Selections ◦ perform projection on each tuple
Sorting ◦ followed by duplicate elimination
External Sort-Merge

Join Operation

Other Operations

Module Summary

Database Management Systems Partha Pratim Das 56.32

Other Operations: Aggregation

Module 56
• Aggregation can be implemented in a manner similar to duplicate elimination
Partha Pratim
Das ◦ Sorting or hashing can be used to bring tuples in the same group together, and then
Week Recap
the aggregate functions can be applied on each group
Objectives & ◦ Optimization: combine tuples in the same group during run generation and
Outline
intermediate merges, by computing partial aggregate values
Query Processing

Query Cost
. For count, min, max, sum: keep aggregate values on tuples found so far in the
Selection group
Operation
Complex Selections
− When combining partial aggregate for count, add up the aggregates
Sorting . For avg, keep sum and count, and divide sum by count at the end
External Sort-Merge

Join Operation

Other Operations

Module Summary

Database Management Systems Partha Pratim Das 56.33

Module Summary

Module 56
• Understood the overall flow for Query Processing and defined the Measures of Query
Partha Pratim
Das Cost
Week Recap • Studied the algorithms for processing Selection Operations, Sorting, Join Operations
Objectives & and a few Other Operations
Outline

Query Processing

Query Cost

Selection
Operation
Complex Selections

Sorting
External Sort-Merge

Join Operation

Other Operations

Module Summary
Slides used in this presentation are borrowed from [Link] with kind
permission of the authors.
Edited and new slides are marked with “PPD”.

Database Management Systems Partha Pratim Das 56.34

Module 57

Partha Pratim
Das

Objectives &
Outline Database Management Systems
Query
Optimization Module 57: Query Processing and Optimization/2: Optimization
Equivalent
Expressions
Evaluation Plan
Cost

Transformation of
Relational
Partha Pratim Das
Expressions
Equivalence Rules
Example
Department of Computer Science and Engineering
Plan Generation
Indian Institute of Technology, Kharagpur
Module Summary
ppd@[Link]

Database Management Systems Partha Pratim Das 57.1

Module Recap PPD

Module 57

Partha Pratim • Understood the overall flow for Query Processing and defined the Measures of Query
Das
Cost
Objectives &
Outline • Studied the algorithms for processing Selection Operations, Sorting, Join Operations
Query and a few Other Operations
Optimization
Equivalent
Expressions
Evaluation Plan
Cost

Transformation of
Relational
Expressions
Equivalence Rules
Example
Plan Generation

Module Summary

Database Management Systems Partha Pratim Das 57.2

Module Objectives PPD

Module 57

Partha Pratim • To understand the basic issues for optimizing queries

Das
• To understand how transformation of Relational Expressions can create alternates for
Objectives &
Outline optimization
Query
Optimization
Equivalent
Expressions
Evaluation Plan
Cost

Transformation of
Relational
Expressions
Equivalence Rules
Example
Plan Generation

Module Summary

Database Management Systems Partha Pratim Das 57.3

Module Outline PPD

Module 57

Partha Pratim • Introduction to Query Optimization

Das
• Transformation of Relational Expressions
Objectives &
Outline

Query
Optimization
Equivalent
Expressions
Evaluation Plan
Cost

Transformation of
Relational
Expressions
Equivalence Rules
Example
Plan Generation

Module Summary

Database Management Systems Partha Pratim Das 57.4

Introduction to Query Optimization PPD

Module 57

Partha Pratim
Das

Objectives &
Outline

Query
Optimization
Equivalent
Expressions
Evaluation Plan
Cost

Transformation of
Relational
Expressions
Equivalence Rules
Example
Plan Generation
Introduction to Query Optimization
Module Summary

Database Management Systems Partha Pratim Das 57.5

Query Optimization

Module 57

Partha Pratim 5 Theta-join operations (and natural joins) are commutative

Das

Objectives & E1 o
n θ E2 = E2 o
n θ E1
Outline

Query
Optimization 6 a. Natural join operations are associative:
Equivalent
Expressions
Evaluation Plan (E1 o
n E2 ) o
n E3 = E1 o
n (E2 o
n E3 )
Cost

Transformation of
Relational b. Theta joins are associative in the following manner:
Expressions
Equivalence Rules
Example (E1 o
n θ 1 E2 ) o
nθ2 ∧θ3 E3 = E1 o
nθ1 ∧θ3 (E2 o
n θ 2 E3 )
Plan Generation

Module Summary
where θ2 involves attributes from E2 and E3 only

Database Management Systems Partha Pratim Das 57.12

Equivalence Rules (3): Pictorial Depiction

Module 57

Partha Pratim
Das

Objectives &
Outline

Query
Optimization
Equivalent
Expressions
Evaluation Plan
Cost

Transformation of
Relational
Expressions
Equivalence Rules
Example
Plan Generation

Module Summary

Database Management Systems Partha Pratim Das 57.13

Equivalence Rules (4)

Module 57

Partha Pratim 7 The selection operation distributes over the theta join operation under the following
Das
two conditions:
Objectives &
Outline a. When all the attributes in θ0 involve only the attributes of one of the expressions
Query (E1 ) being joined
Optimization
Equivalent
σθ0 (E1 o
nθ E2 ) = (σθ0 (E1 )) o
n θ E2
Expressions
Evaluation Plan
Cost
b. When θ1 involves only the attributes of E1 and θ2 involves only the attributes of E2 .
Transformation of
Relational
Expressions
σθ1 ∧θ2 (E1 o
nθ E2 ) = (σθ1 (E1 )) o
nθ (σθ2 (E2 ))
Equivalence Rules
Example
Plan Generation

Module Summary

Database Management Systems Partha Pratim Das 57.14

Equivalence Rules (5)

Module 57

Partha Pratim 8 The projection operation distributes over the theta join operation as follows:
Das

Objectives &
a. if θ involves
Q only attributes from Q L1 ∪ L2 :Q
Outline L1 ∪L2 (E 1 on θ E 2 ) = L1 (E1 ) o
nθ L2 (E2 )
Query
Optimization
b. Consider a join E1 o n θ E2
Equivalent
Expressions
• Let L1 and L2 be sets of attributes from E1 and E2 , respectively
Evaluation Plan • Let L3 be attributes of E1 that are involved in join condition θ, but are not in
Cost

Transformation of
L1 ∪ L2 , and
Relational
Expressions
• Let L4 be attributes of E2 that are involved in join condition θ, but are not in
Equivalence Rules L1 ∪ L2Q . Q Q Q
Example
Plan Generation L1 ∪L2 (E1 onθ E2 ) = L1 ∪L2 ( L1 ∪L3 (E1 )) o
nθ ( L2 ∪L4 (E2 ))
Module Summary

Database Management Systems Partha Pratim Das 57.15

Equivalence Rules (6)

Module 57
9 The set operations union and intersection are commutative.
Partha Pratim
Das E1 ∪ E2 = E2 ∪ E1
Objectives &
E1 ∩ E2 = E2 ∩ E1
Outline • (set difference is not commutative).
Query
Optimization 10 Set union and intersection are associative.
Equivalent
Expressions • (E 1 ∪ E 2) ∪ E 3 = E 1 ∪ (E 2 ∪ E 3)
Evaluation Plan
Cost • (E 1 ∩ E 2) ∩ E 3 = E 1 ∩ (E 2 ∩ E 3)
Transformation of
Relational
11 The selection operation distributes over ∪, ∩, −
Expressions
Equivalence Rules
σθ (E1 − E2 ) = σθ (E1 ) − σθ (E2 )
Example and similarly for ∪ and ∩ in place of −
Plan Generation

Module Summary Also: σθ (E1 − E2 ) = σθ (E1 ) − E2

and similarly for ∩ in place of −, but not for ∪
12 The projection operation distributes over union
πL (E1 ∪ E2 ) = (πL (E1 )) ∪ (πL (E2 ))
Database Management Systems Partha Pratim Das 57.16
Exercise

Module 57

Multiple Transformations (2)

Module 57

Partha Pratim
Das

Objectives &
Outline

Query
Optimization
Equivalent
Expressions
Evaluation Plan
Cost

Transformation of
Relational
Expressions
Equivalence Rules
Example
Plan Generation

Module Summary

Database Management Systems Partha Pratim Das 57.20

Transformation Example: Pushing Projections

Module 57
• Consider:
Partha Pratim
Das πname,title ((σdept name=”Music” (instructor )) n (teaches o
o n πcourse id,title (course)))
Objectives & • When we compute
Outline

Query
σdept name=”Music” (instructor o
n teaches)
Optimization
Equivalent
we obtain a relation whose schema is:
Expressions
Evaluation Plan
(ID, name, dept name, salary , course id, sec id, semester , year )
Cost
• Push projections using equivalence rules 8a and 8b; eliminate unneeded attributes from
Transformation of
Relational intermediate results to get:
Expressions
Equivalence Rules πname,title (πname,course id (σdept name=”Music” (instructor ) o
n teaches)) o
n
Example
Plan Generation πcourse id,title (course)
Module Summary
• Performing the projection as early as possible reduces the size of the relation to be
joined
Q Q Q
L ∪L2 (E1 noθ E2 ) = L (E1 ) n
oθ L2 (E2 )
Q 1 Q 1 Q Q
L1 ∪L2 (E1 n
oθ E2 ) = L ∪L
1 2
( L ∪L (E1 )) n
1 3
oθ ( L ∪L (E2 ))
2 4

Database Management Systems Partha Pratim Das 57.21

Join Ordering Example

Module 57

Partha Pratim • For all relations r1 , r2 , and r3 ,

Das
(r1 o
n r2 ) o
n r3 = r1 o n (r2 on r3 )
Objectives &
Outline
(Join Associativity)
Query • If r2 o
n r3 is quite large and r1 o
n r2 is small, we choose
Optimization
Equivalent (r1 o
n r2 ) o
n r3
Expressions
Evaluation Plan so that we compute and store a smaller temporary relation
Cost

Transformation of
Relational
Expressions
Equivalence Rules
Example
Plan Generation

Module Summary

Database Management Systems Partha Pratim Das 57.22

Join Ordering Example (2)

Module 57

Partha Pratim • Consider the expression

Das
πname,title (σdept name=”Music” (instructor ) o
n teaches) o
n πcourse id,title (course)
Objectives &
Outline • Could compute teaches o n πcourse id,title (course) first, and join result with
Query σdept name=”Music” (instructor )
Optimization
Equivalent but the result of the first join is likely to be a large relation
Expressions
Evaluation Plan • Only a small fraction of the university’s instructors are likely to be from the Music
Cost

Transformation of
department
Relational
Expressions
◦ it is better to compute
Equivalence Rules σdept name=”Music” (instructor ) o
n (teaches)
Example
Plan Generation first
Module Summary

Database Management Systems Partha Pratim Das 57.23

Enumeration of Equivalent Expressions

Module 57

Partha Pratim • Query optimizers use equivalence rules to systematically generate expressions
Das
equivalent to the given expression
Objectives &
Outline • Can generate all equivalent expressions as follows:
Query
Optimization
◦ Repeat
Equivalent
Expressions
. apply all applicable equivalence rules on every subexpression of every equivalent
Evaluation Plan
Cost
expression found so far
Transformation of
. add newly generated expressions to the set of equivalent expressions
Relational
Expressions Until no new equivalent expressions are generated above
Equivalence Rules
Example • The above approach is very expensive in space and time
Plan Generation
◦ Two approaches
Module Summary
. Optimized plan generation based on transformation rules
. Special case approach for queries with only selections, projections and joins

Database Management Systems Partha Pratim Das 57.24

Implementing Transformation Based Optimization

Module 57

Partha Pratim • Space requirements reduced by sharing common sub-expressions:

Das
◦ when E1 is generated from E2 by an equivalence rule, usually only the top level of
Objectives &
Outline the two are different, subtrees below are the same and can be shared using pointers
Query
Optimization
. E.g. when applying join commutativity
Equivalent
Expressions
Evaluation Plan
Cost

Transformation of
Relational
Expressions
Equivalence Rules
Example
Plan Generation

Module Summary ◦ Same sub-expression may get generated multiple times

. Detect duplicate sub-expressions and share one copy
• Time requirements are reduced by not generating all expressions
◦ Dynamic programming
Database Management Systems Partha Pratim Das 57.25
Module Summary

Module 57

Partha Pratim • Understood the basic issues for optimizing queries

Das
• For every relational expression, usually there are a number of equivalent expressions
Objectives &
Outline that can be created by simple transformations
Query
Optimization
• Final execution plan can be created by choose the estimated least cost expression from
Equivalent
Expressions
the alternates
Evaluation Plan
Cost

Transformation of
Relational
Expressions
Equivalence Rules
Example
Plan Generation

Module Summary

Partha Pratim
Das

Objectives &
Outline Database Management Systems
Performance and
Scalability Module 58: RDBMS Performance & Architecture
Performance Factors
& Issues

Das
• RDBMS Architecture
Objectives &
Outline • Scaling Databases
Performance and
Scalability
Performance Factors
& Issues

Architecture
Centralized &
Client-Server
Server Systems
Parallel Systems
Speedup & Scaleup
Interconnect
Distributed Systems

Scaling
Databases
Scaling out
Databases

Module Summary

Database Management Systems Partha Pratim Das 58.4

RDBMS Performance and Scalability PPD

Module 58

Partha Pratim
Das

Objectives &
Outline

Performance and
Scalability
Performance Factors
& Issues

Architecture
Centralized &
Client-Server
Server Systems
Parallel Systems
Speedup & Scaleup
Interconnect
Distributed Systems
RDBMS Performance and Scalability
Scaling
Databases
Scaling out
Databases

Module Summary

Database Management Systems Partha Pratim Das 58.5

What do DBMS Applications Need?

Module 58 • Throughput, Response Time, & Availability • Correctness

Partha Pratim ◦ Throughput is transactions / second (tps) ◦ Any given database transaction must
Das
◦ Response Time is the delay from submis- change affected data only in allowed ways
Objectives &
sion of transaction to return of result ◦ ACID Properties
Outline ◦ Availability is the mean time to failure • Scalability
Performance and ◦ At Transaction Level
Scalability
. Concurrency Control ◦ Ability to scale up a database to allow it to
Performance Factors hold increasing amounts of data without
& Issues . Query Optimization sacrificing performance
Architecture
◦ At System Level ◦ Should be able to scale with volume of
Centralized &
Client-Server . System Architecture data, number of users, diversity of ser-
Server Systems
Parallel Systems
. Database Architecture vices, geographic expanse, etc.
Speedup & Scaleup . Performance Tuning ◦ Scalability can be achieved by
Interconnect
Distributed Systems
− Hardware: disks to speed up I/O, . System Architecture
memory to increase buffer hits, . Database Architecture
Scaling
Databases move to a faster processor . Scale expectations with scale of the
Scaling out − Database System Parameters: system
Databases
set buffer size to avoid paging, set . Alternate Data Models
Module Summary
checkpointing to limit log size . Accommodate Hybrid Systems
− Higher level database design: ◦ ...
schema, indices and transactions
Database Management Systems Partha Pratim Das 58.6
RDBMS Architecture PPD

Module 58

Partha Pratim
Das

Objectives &
Outline

Performance and
Scalability
Performance Factors
& Issues

Architecture
Centralized &
Client-Server
Server Systems
Parallel Systems
Speedup & Scaleup
Interconnect
Distributed Systems
RDBMS Architecture
Scaling
Databases
Scaling out
Databases

Architecture
Centralized &
Client-Server
Server Systems
Parallel Systems
Speedup & Scaleup
Interconnect
Distributed Systems

Scaling
Databases
Scaling out
Databases

Module Summary

Database Management Systems Partha Pratim Das 58.10

RDBMS Architecture: Centralized & Client-Server

Module 58

Partha Pratim • Server systems satisfy requests generated at m client systems

Das

Objectives &
Outline

Performance and
Scalability
Performance Factors
& Issues

Architecture
Centralized &
Client-Server
Server Systems
Parallel Systems
Speedup & Scaleup
Interconnect
Distributed Systems

Scaling
Databases
Scaling out
Databases

Module Summary

Database Management Systems Partha Pratim Das 58.11

RDBMS Architecture: Centralized & Client-Server

Module 58 • Database functionality can be divided into:

Partha Pratim ◦ Back-end: manages access structures, query evaluation and optimization,
Das
concurrency control and recovery
Objectives & ◦ Front-end: consists of tools such as forms, report-writers, and graphical user
Outline

Performance and
interface facilities
Scalability • The interface between the front-end and the back-end is through SQL or through an
Performance Factors
& Issues application program interface
Architecture
Centralized &
Client-Server
Server Systems
Parallel Systems
Speedup & Scaleup
Interconnect
Distributed Systems

Scaling
Databases
Scaling out
Databases

Module Summary

Database Management Systems Partha Pratim Das 58.12

RDBMS Architecture: Server Systems

Module 58 • Transaction or Query servers which are widely used in relational database systems
Partha Pratim ◦ A typical transaction cycle is:
Das
. Clients send requests to the server
Objectives & . Transactions are executed at the server
Outline
. Results are shipped back to the client
Performance and
Scalability ◦ Requests are specified in SQL, and communicated to the server through a remote
Performance Factors
& Issues procedure call (RPC) mechanism
Architecture ◦ Transactional RPC allows many RPC calls to form a transaction.
Centralized &
Client-Server ◦ ODBC / JDBC used to connect
Server Systems
Parallel Systems • Data servers, used in object-oriented database systems
Speedup & Scaleup
Interconnect
◦ Used in high-speed LANs, in cases where
Distributed Systems . The clients are comparable in processing power to the server
Scaling . The tasks to be executed are compute intensive
Databases
Scaling out ◦ Issues:
Databases
. Page-Shipping versus Item-Shipping
Module Summary
. Locking
. Data Caching
. Lock Caching
Database Management Systems Partha Pratim Das 58.13
RDBMS Architecture: Server Systems

Module 58

Partha Pratim
Das

Objectives &
Outline

Performance and
Scalability
Performance Factors
& Issues

Architecture
Centralized &
Client-Server
Server Systems
Parallel Systems
Speedup & Scaleup
Interconnect
Distributed Systems

Scaling
Databases
Scaling out
Databases

Module Summary

Database Management Systems Partha Pratim Das 58.14

RDBMS Architecture: Parallel Systems

Module 58

Partha Pratim • Parallel database systems consist of multiple processors and multiple disks connected by
Das
a fast interconnection network
Objectives &
Outline • A coarse-grain parallel machine consists of a small number of powerful processors
Performance and
Scalability
• A massively parallel or fine grain parallel machine utilizes thousands of smaller
Performance Factors
& Issues
processors
Architecture • Two main performance measures:
Centralized &
Client-Server
Server Systems
◦ throughput: the number of tasks that can be completed in a given time interval
Parallel Systems ◦ response time the amount of time it takes to complete a single task from the time
Speedup & Scaleup
Interconnect
it is submitted
Distributed Systems

Scaling
Databases
Scaling out
Databases

Module Summary

Database Management Systems Partha Pratim Das 58.15

RDBMS Architecture: Parallel Systems

Module 58

Partha Pratim • Speedup: a fixed-sized problem executing on a small system is given to a system which
Das
is N-times larger
Objectives &
Outline ◦ Measured by:
Performance and
Scalability
. Speedup = small system elapsed time
large system elapsed time
Performance Factors
& Issues
◦ Speedup is linear if equation equals N
Architecture ◦ Speedup Percentage = Speedup N ∗ 100%
Centralized &
Client-Server • Scaleup: increase the size of both the problem and the system N-times larger system
Server Systems
Parallel Systems
used to perform N-times larger job
Speedup & Scaleup
Interconnect
◦ Measured by:
Distributed Systems
. Scaleup = small system small problem elapsed time
big system big problem elapsed time
Scaling
Databases ◦ Scale up is linear if equation equals 1
Scaling out
Databases

Module Summary

Database Management Systems Partha Pratim Das 58.16

RDBMS Architecture: Parallel Systems: Speedup and Scaleup

Module 58

Partha Pratim
Das

Objectives &
Outline

Horizontal Vs. Vertical Scaling (2) PPD

Module 58
Horizontal Scaling Vertical Scaling
Partha Pratim
Das Advantages
Objectives &
Outline
• Scaling is easier from a hardware per-
Performance and • Cost-effective
Scalability spective
Performance Factors • Less complex process communication
& Issues • Fewer periods of downtime
Architecture • Less complicated maintenance
Centralized & • Increased resilience and fault tolerance
Client-Server
• Less need for software changes
Server Systems • Increased performance
Parallel Systems
Speedup & Scaleup
Disadvantages
Interconnect
Distributed Systems
• Increased complexity of maintenance • Higher possibility for downtime
Scaling
Databases
Scaling out
and operation • Single point of failure
Databases
• Increased Initial costs • Upgrade limitations
Module Summary

Source: Horizontal Vs. Vertical Scaling: How Do They Compare?

Database Management Systems Partha Pratim Das 58.26

Scaling out RDBMS PPD

Module 58 • Master/Slave
Partha Pratim
Das
◦ All writes are written to the master
◦ All reads performed against the replicated slave databases
Objectives &
Outline ◦ Critical reads may be incorrect as writes may not have been propagated down
Performance and ◦ Large datasets can pose problems as master needs to duplicate data to slaves
Scalability
Performance Factors
& Issues
• Sharding (Partitioning)
Architecture ◦ Scales well for both reads and writes
Centralized &
Client-Server ◦ Not transparent, application needs to be partition-aware
Server Systems
Parallel Systems
◦ Can no longer have relationships/joins across partitions
Speedup & Scaleup ◦ Loss of referential integrity across shards
Interconnect
Distributed Systems • Other Options
Scaling
Databases ◦ Multi-Master replication
Scaling out
Databases ◦ INSERT only, not UPDATES/DELETES
Module Summary ◦ No JOINs, thereby reducing query time → This involves de-normalizing data
◦ In-memory databases
Source: Introduction to NOSQL Databases, SlidePlayer
Database Management Systems Partha Pratim Das 58.27
Module Summary

Module 58

Partha Pratim • Evaluated RDBMS, especially with reference to performance and scalability, as a
Das
backbone for data-intensive application development
Objectives &
Outline • Understood the role of system and database architecture in performance
Performance and
Scalability
• Understood the options for scaling databases
Performance Factors
& Issues

Architecture
Centralized &
Client-Server
Server Systems
Parallel Systems
Speedup & Scaleup
Interconnect
Distributed Systems

Scaling
Databases
Scaling out
Databases
Slides used in this presentation are borrowed from [Link] with kind
Module Summary
permission of the authors.
Edited and new slides are marked with “PPD”.
Database Management Systems Partha Pratim Das 58.28
Module 59

Partha Pratim
Das

Objectives &
Outline Database Management Systems
What is Big
Data? Module 59: Non-Relational DBMS: NOSQL
What is NOSQL?
The Perfect Storm

CAP Theorem
Consistency
Partha Pratim Das
Types of NOSQL
Databases
Key-value Stores Department of Computer Science and Engineering
Document Stores Indian Institute of Technology, Kharagpur
Column Stores
Graph Stores
ppd@[Link]
Relational vs.
Non-Relational

Module Summary

Database Management Systems Partha Pratim Das 59.1

Module Recap PPD

Module 59

Partha Pratim • Evaluated RDBMS, especially with reference to performance and scalability, as a
Das
backbone for data-intensive application development
Objectives &
Outline • Understood the role of system and database architecture in performance
What is Big
Data?
• Understood the options for scaling databases
What is NOSQL?
The Perfect Storm

CAP Theorem
Consistency

Types of NOSQL
Databases
Key-value Stores
Document Stores
Column Stores
Graph Stores

Relational vs.
Non-Relational

Module Summary

Database Management Systems Partha Pratim Das 59.2

Module Objectives PPD

Module 59

Partha Pratim • To understand issues in Big Data

Das
• To understand the approach of NOSQL and CAP theorem viz-a-viz ACID
Objectives &
Outline • To take tour of common types of NOSQL database
What is Big
Data?

What is NOSQL?
The Perfect Storm

What is Big Data? PPD

Module 59

Partha Pratim
Das

Objectives &
Outline

What is Big
Data?

What is NOSQL?
The Perfect Storm

CAP Theorem
Consistency

Types of NOSQL
Databases
Key-value Stores
Document Stores
Column Stores
Graph Stores

Relational vs.
Non-Relational

Module Summary

Database Management Systems Partha Pratim Das 59.6

What is Big Data? PPD

Module 59
• Big data is data sets that are so voluminous and complex that traditional
Partha Pratim
Das
data-processing application software are inadequate to deal with them
Objectives &
• Big data challenges include
Outline ◦ capturing data,
What is Big ◦ data storage,
Data?
◦ data analysis,
What is NOSQL?
The Perfect Storm
◦ search,
◦ sharing,
CAP Theorem
Consistency
◦ transfer,
◦ visualization,
Types of NOSQL
Databases ◦ querying,
Key-value Stores ◦ updating,
Document Stores
◦ information privacy and
Column Stores
Graph Stores
◦ data source
Relational vs.
Non-Relational
• It refers to the use of predictive analytics, user behavior analytics, or certain other
Module Summary
advanced data analytics methods that extract value from big data, and seldom to a
particular size of data set
Database Management Systems Partha Pratim Das 59.7
What is Big Data? PPD

Module 59
• 5V’s (characteristics) of big data:
Partha Pratim
Das ◦ Volume: The quantity of generated and stored data. The size of the data
Objectives &
determines the value and potential insight, and whether it can be considered big
Outline data or not.
What is Big
Data?
◦ Variety: The type and nature of the data. This helps people who analyze it to
What is NOSQL? effectively use the resulting insight. Big data draws from text, images, audio, video;
The Perfect Storm
plus it completes missing pieces through data fusion.
CAP Theorem
Consistency
◦ Velocity: In this context, the speed at which the data is generated and processed
Types of NOSQL
to meet the demands and challenges that lie in the path of growth and
Databases
Key-value Stores
development. Big data is often available in real-time.
Document Stores ◦ Variability: Inconsistency of the data set can hamper processes to handle and
Column Stores
Graph Stores
manage it.
Relational vs. ◦ Veracity: The data quality of captured data can vary greatly, affecting the accurate
Non-Relational
analysis
Module Summary

Database Management Systems Partha Pratim Das 59.8

What is NOSQL? PPD

Module 59

Partha Pratim
Das

Objectives &
Outline

What is Big
Data?

What is NOSQL?
The Perfect Storm

CAP Theorem
Consistency

Types of NOSQL
Databases
Key-value Stores
Document Stores
What is NOSQL?
Column Stores
Graph Stores

Relational vs.
Non-Relational

Module Summary

Database Management Systems Partha Pratim Das 59.9

What is NOSQL? PPD

Module 59 • A NoSQL database provides a mechanism for storage and retrieval of data that is
Partha Pratim modeled in means other than the tabular relations used in relational databases
Das
• NoSQL databases are increasingly used in big data and real-time web applications
Objectives &
Outline
• Such databases have existed since the late 1960s
What is Big
Data? ◦ Network Database Model (NDBMS) is a flexible way of representing objects and
What is NOSQL? their relationships. Its distinguishing feature is that the schema, viewed as a graph
The Perfect Storm
in which object types are nodes and relationship types are arcs, is not restricted to
CAP Theorem
Consistency being a hierarchy or lattice.
Types of NOSQL It was introduced in 1969 and widely replaced by relational databases in the 1980s
Databases
Key-value Stores ◦ Hierarchical Database Model (HDBMS) organizes data into a tree-like
Document Stores
Column Stores
structure. The data are stored as records which are connected to one another
Graph Stores through links. A record is a collection of fields, with each field containing only one
Relational vs.
Non-Relational
value. The type of a record defines which fields the record contains.
Module Summary It was recognized as the first database model in the 1960s and widely replaced by
relational databases in the 1980s
Source: Introduction to NOSQL Databases, SlidePlayer
Database Management Systems Partha Pratim Das 59.10
What is NOSQL? PPD

Module 59 • Stands for Not Only SQL. Also referred as Non-Relational DBSM (NDBMS) and as
Partha Pratim Multi-Model Databases
Das
• ”NoSQL” was coined in the early 21st century, triggered by Web 2.0 companies
Objectives &
Outline
• The term NOSQL was introduced by Carl Strozzi in 1998 for his lightweight Strozzi
What is Big
Data? NoSQL open-source relational database and re-introduced by Eric Evans when an event
What is NOSQL? was organized to discuss open source distributed databases
The Perfect Storm

CAP Theorem
• Eric states that “... but the whole point of seeking alternatives is that you need to
Consistency solve a problem that relational databases are a bad fit for ...”
Types of NOSQL
Databases
Key-value Stores
Document Stores
Column Stores
Graph Stores

Relational vs.
Non-Relational

Module Summary

Source: Introduction to NOSQL Databases, SlidePlayer

Database Management Systems Partha Pratim Das 59.11
What is NOSQL? PPD

Module 59 Advantages Disadvantages

Partha Pratim
• non-relational • Don’t fully support relational features
Das
• don’t require schema ◦ no join, group by, order by opera-
Objectives &
Outline • data are replicated to multiple nodes tions (except within partitions)
What is Big (so, identical & fault-tolerant) and ◦ no referential integrity constraints
Data?

What is NOSQL?
can be partitioned: across partitions
The Perfect Storm ◦ down nodes easily replaced • No declarative query language (like
CAP Theorem
Consistency
◦ no single point of failure SQL) → more programming
Types of NOSQL • horizontal scalable • Relaxed ACID (CAP theorem) →
Databases
Key-value Stores • cheap, easy to implement (open- fewer guarantees
Document Stores
Column Stores source) • No easy integration with other appli-
Graph Stores
• massive write performance cations that support SQL
Relational vs.
Non-Relational
• fast key-value access
Module Summary

Source: Introduction to NOSQL Databases, SlidePlayer

Database Management Systems Partha Pratim Das 59.12

What are NOSQL DBs? PPD

Module 59

Partha Pratim
Das

Objectives &
Outline

What is Big
Data?

What is NOSQL?
The Perfect Storm

CAP Theorem
Consistency

Types of NOSQL
Databases
Key-value Stores
Document Stores
Column Stores
Graph Stores

Relational vs.
Non-Relational

Module Summary

Source: Introduction to NOSQL Databases, SlidePlayer

Database Management Systems Partha Pratim Das 59.13
Who is using NOSQL? PPD

Module 59

Partha Pratim
Das

Objectives &
Outline

What is Big
Data?

What is NOSQL?
The Perfect Storm

CAP Theorem
Consistency

Types of NOSQL
Databases
Key-value Stores
Document Stores
Column Stores
Graph Stores

Relational vs.
Non-Relational

Module Summary

Source: Introduction to NOSQL Databases, SlidePlayer

Database Management Systems Partha Pratim Das 59.14

The Perfect Storm PPD

Module 59
• The Perfect Storm
Partha Pratim
Das ◦ Large datasets
Objectives &
◦ Acceptance of alternatives, and
Outline ◦ dynamically-typed data
What is Big
Data? has come together in a “perfect storm”
What is NOSQL?
The Perfect Storm
• Not a backlash against RDBMS
CAP Theorem • SQL is a rich query language that cannot be rivaled by the current list of NOSQL
Consistency
offerings
Types of NOSQL
Databases Source: Introduction to NOSQL Databases, SlidePlayer
Key-value Stores
Document Stores
Column Stores
Graph Stores

Relational vs.
Non-Relational

Module Summary

Database Management Systems Partha Pratim Das 59.15

NOSQL: 3 Major Papers PPD

Module 59
• BigTable (Google): Bigtable: A Distributed Storage System for Structured Data, 2006
Partha Pratim
Das • DynamoDB (Amazon): Amazon’s Dynamo, 2007
Objectives & ◦ Ring partition and replication
Outline
◦ Gossip protocol (discovery and error detection)
What is Big
Data? ◦ Distributed key-value data stores
What is NOSQL? ◦ Eventual consistency: Eventually Consistent - Revisited, 2008. Choosing Consistency, 2010
The Perfect Storm

CAP Theorem • CAP Theorem: Brewer’s CAP Theorem, 2009

Consistency
Source: Introduction to NOSQL Databases, SlidePlayer
Types of NOSQL
Databases
Key-value Stores
Document Stores
Column Stores
Graph Stores

Relational vs.
Non-Relational

Module Summary

Database Management Systems Partha Pratim Das 59.16

CAP Theorem PPD

Module 59

Partha Pratim
Das

Objectives &
Outline

What is Big
Data?

What is NOSQL?
The Perfect Storm

CAP Theorem
Consistency

Types of NOSQL
Databases
Key-value Stores
Document Stores
CAP Theorem
Column Stores
Graph Stores

CAP Theorem (3): Consistency PPD

Module 59
• All client always have the same view of the data
Partha Pratim
Das • Consistency: Two types:
Objectives & ◦ Strong Consistency: ACID (Atomicity, Consistency, Isolation, Durability)
Outline
◦ Weak Consistency: BASE (Basically Available Soft-state Eventual consistency)
What is Big
Data?
• ACID: A DBMS is expected to support “ACID transactions,” processes that are:
What is NOSQL?
The Perfect Storm ◦ Atomicity: either the whole process is done or none is
CAP Theorem ◦ Consistency: only valid data are written
Consistency
◦ Isolation: one operation at a time
Types of NOSQL
Databases ◦ Durability: once committed, it stays that way
Key-value Stores
Document Stores • CAP
Column Stores
Graph Stores ◦ Consistency: all data on cluster has the same copies
Relational vs. ◦ Availability: cluster always accepts reads and writes
Non-Relational

Module Summary
◦ Partition tolerance: guaranteed properties are maintained even when network
failures prevent some machines from communicating with others
Source: Introduction to NOSQL Databases, SlidePlayer
Database Management Systems Partha Pratim Das 59.20
CAP Theorem (4): Consistency PPD

Module 59 • A consistency model determines rules for visibility and apparent order of updates
Partha Pratim • Example:
Das
◦ Row X is replicated on nodes M and N
Objectives & ◦ Client A writes row X to node N
Outline
◦ Some period of time t elapses
What is Big
Data?
◦ Client B reads row X from node M
◦ Does client B see the write from client A?
What is NOSQL?
The Perfect Storm
◦ Consistency is a continuum with tradeoffs
CAP Theorem
◦ For NOSQL, the answer would be: “maybe”
Consistency
◦ CAP theorem states: “strong consistency can’t be achieved at the same time as availability and
Types of NOSQL
partition-tolerance”
Databases
Key-value Stores
• Eventual consistency
Document Stores
Column Stores
◦ When no updates occur for a long period of time, eventually all updates will
Graph Stores propagate through the system and all the nodes will be consistent
Relational vs.
Non-Relational • Cloud computing
Module Summary
◦ ACID is hard to achieve, moreover, it is not always required, for example, for blogs,
status updates, product listings, etc.
Source: Introduction to NOSQL Databases, SlidePlayer
Database Management Systems Partha Pratim Das 59.21
CAP Theorem (5) PPD

Module 59

Partha Pratim
Das

Objectives &
Outline

What is Big
Data?

What is NOSQL?
The Perfect Storm

CAP Theorem
Consistency

Types of NOSQL
Databases
Key-value Stores
Document Stores
Column Stores
Graph Stores

Relational vs.
Non-Relational

Module Summary

Source: Introduction to NOSQL Databases, SlidePlayer

Database Management Systems Partha Pratim Das 59.22
Types of NOSQL Databases PPD

Module 59

Partha Pratim
Das

Objectives &
Outline

What is Big
Data?

What is NOSQL?
The Perfect Storm

CAP Theorem
Consistency

Types of NOSQL
Databases
Key-value Stores
Document Stores
Types of NOSQL Databases
Column Stores
Graph Stores

Relational vs.
Non-Relational

Module Summary

Database Management Systems Partha Pratim Das 59.23

Types of NoSQL Databases PPD

Module 59
• Key-value Stores: DynamoDB, Voldermort, Scalaris, Redis, MemcacheDB
Partha Pratim
◦ Work by matching keys with values, similar to a dictionary. There is no structure nor relation
Das
• Document Stores: MongoDB, Couchbase/CouchDB
Objectives &
Outline
◦ Work similarly to column-based ones; however, they allow much deeper nesting and complex
structures to be achieved (for example, a document, within a document, within a document)
What is Big
Data? . Documents overcome the constraints of 1 / 2 levels of key / value nesting of columnar databases
What is NOSQL?
The Perfect Storm
• Column Stores: BigTable, Cassandra, Hbased
CAP Theorem ◦ Column-based NoSQL databases are two dimensional arrays whereby each key (that is, row /
Consistency record) has one or more key / value pairs attached to it and these management systems allow very
Types of NOSQL large and un-structured data to be kept and used (for example, a record with tons of information)
Databases
Key-value Stores • Graph Stores: OrientDB, Neo4J, InfoGrid
Document Stores
Column Stores
◦ These use tree-like structures (graphs) with nodes and edges connecting each other through relations
Graph Stores
• Time Series (Discussed in Module 30): InfluxDB, Kdb+, Prometheus, Graphite
Relational vs.
Non-Relational ◦ A time series database (TSDB) is a database optimized for time-stamped or time series data
Module Summary ◦ Measurements or events that are tracked, monitored, downsampled, and aggregated over time
• No-schema and support for flexible data types are common characteristics of most NOSQL systems
Source: Introduction to NOSQL Databases, SlidePlayer
Database Management Systems Partha Pratim Das 59.24
Multi-Model Databases PPD

Module 59

Partha Pratim
Das

Objectives &
Outline

What is Big
Data?

What is NOSQL?
The Perfect Storm

CAP Theorem
Consistency

Types of NOSQL
Databases
Key-value Stores
Document Stores
Column Stores
Graph Stores

Relational vs.
Non-Relational

Module Summary

Source: Database Software Market: The Long-Awaited Shake-up by William Blair, 2019

Database Management Systems Partha Pratim Das 59.25

NoSQL Databases: Key-value Stores PPD

Module 59 • Focus on scaling to huge amounts of data

Partha Pratim
Das
• Designed to handle massive load
• Based on Amazon’s dynamo paper
Objectives &
Outline • Data model: (global) collection of Key-value pairs
What is Big
Data? • Dynamo ring partitioning and replication
What is NOSQL? • Example: (DynamoDB)
The Perfect Storm

CAP Theorem
◦ items having one or more attributes (name, value)
Consistency ◦ An attribute can be single-valued or multi-valued like set
Types of NOSQL
◦ Items are combined into a table
Databases
Key-value Stores
• Basic API access:
Document Stores
Column Stores
◦ get(key): extract the value given a key
Graph Stores
◦ put(key, value): create or update the value given its key
Relational vs.
◦ delete(key): remove the key and its associated value
Non-Relational ◦ execute(key, operation, parameters): invoke an operation to the value (given its key) which is
Module Summary a special data structure (e.g. List, Set, Map .... etc)
Source: Introduction to NOSQL Databases, SlidePlayer

Database Management Systems Partha Pratim Das 59.26

NoSQL Databases: Key-value Stores PPD

Module 59
• Pros:
Partha Pratim
Das ◦ very fast
Objectives &
◦ very scalable (horizontally distributed to nodes based on key)
Outline ◦ simple data model
What is Big
Data?
◦ eventual consistency
What is NOSQL? ◦ fault-tolerance
The Perfect Storm
• Cons:
CAP Theorem
Consistency ◦ Can’t model more complex data structure such as objects
Types of NOSQL
Databases Source: Introduction to NOSQL Databases, SlidePlayer
Key-value Stores
Document Stores
Column Stores
Graph Stores

Relational vs.
Non-Relational

Module Summary

Database Management Systems Partha Pratim Das 59.27

NoSQL Databases: Key-value Stores PPD

Module 59

Partha Pratim
Das

Objectives &
Outline

What is Big
Data?

What is NOSQL?
The Perfect Storm

CAP Theorem
Consistency

Types of NOSQL
Databases
Key-value Stores
Document Stores
Column Stores
Graph Stores

Relational vs.
Non-Relational

Module Summary

Source: Introduction to NOSQL Databases, SlidePlayer

Database Management Systems Partha Pratim Das 59.28

NoSQL Databases: Document Stores PPD

Module 59 • Inspired by Lotus Notes, Can model more complex objects

Partha Pratim • Data model: Collection of documents
Das
◦ JSON (JavaScript Object Notation) is a data model, key-value pairs, which supports objects,
Objectives & records, structs, lists, array, maps, dates, Boolean with nesting
Outline
◦ XML and other semi-structured formats
What is Big
Data? • Example: (MongoDB) document
What is NOSQL?
The Perfect Storm
{
Name:"Jaroslav",
CAP Theorem
Consistency
Address:"Malostranske nám. 25, 118 00 Praha 1",
Types of NOSQL
Grandchildren: {
Databases Claire: "7", Barbara: "6", "Magda: "3",
Key-value Stores "Kirsten: "1", "Otis: "3", Richard: "1"
Document Stores
Column Stores
}
Graph Stores Phones: [ "123-456-7890", "234-567-8963" ]
Relational vs. }
Non-Relational

Module Summary Source: Introduction to NOSQL Databases, SlidePlayer

Database Management Systems Partha Pratim Das 59.29

NoSQL Databases: Document Stores PPD

Module 59

Partha Pratim
Das

Objectives &
Outline

What is Big
Data?

What is NOSQL?
The Perfect Storm

CAP Theorem
Consistency

Types of NOSQL
Databases
Source: Introduction to NOSQL Databases, SlidePlayer
Key-value Stores
Document Stores
Column Stores
Graph Stores

Relational vs.
Non-Relational

Module Summary

Database Management Systems Partha Pratim Das 59.30

NoSQL Databases: Document Stores PPD

Module 59

Partha Pratim
Das

Objectives &
Outline

What is Big
Data?

What is NOSQL?
The Perfect Storm

CAP Theorem
Consistency

Types of NOSQL
Databases
Key-value Stores
Document Stores
Column Stores
Source: Introduction to NOSQL Databases, SlidePlayer
Graph Stores

Relational vs.
Non-Relational

Module Summary

Database Management Systems Partha Pratim Das 59.31

NoSQL Databases: Column Stores PPD

Module 59 • Based on BigTable paper

Partha Pratim • Like column oriented RDBMS (store data in column order) but with a twist
Das
• Tables similarly to RDBMS, but handle semi-structured
Objectives &
Outline • Data model:
What is Big
Data?
◦ Collection of Column Families
◦ Column family = (key, value) where value = set of related columns (standard, super)
What is NOSQL?
The Perfect Storm
◦ indexed by row key, column key and timestamp
CAP Theorem
Consistency

Types of NOSQL
Databases
Key-value Stores
Document Stores
Column Stores
Graph Stores

Relational vs.
Non-Relational

Module Summary

Source: Introduction to NOSQL Databases, SlidePlayer

Database Management Systems Partha Pratim Das 59.32
NoSQL Databases: Column Stores PPD

Module 59 • One column family can have variable numbers of columns

Partha Pratim • Cells within a column family are sorted “physically”
Das
• Very sparse, most cells have null values
Objectives &
Outline • Comparison: RDBMS vs column-based NOSQL
What is Big
Data?
◦ Query on multiple tables
What is NOSQL?
. RDBMS: must fetch data from several places on disk and glue together
The Perfect Storm
. Column-based NOSQL: only fetch column families of those columns that are required by a query
CAP Theorem
(all columns in a column family are stored together on the disk, so multiple rows can be
Consistency retrieved in one read operation → data locality)
Types of NOSQL
Databases
• Example: (Cassandra column family–timestamps removed for simplicity)
Key-value Stores
UserProfile = {
Document Stores
Column Stores
Cassandra = { emailAddress:"casandra@[Link]", age:"20" }
Graph Stores TerryCho = { emailAddress:"[Link]@[Link]", gender:"male" }
Relational vs. Cath = { emailAddress:"cath@[Link]", age:"20", gender:"female",
Non-Relational address:"Seoul"
Module Summary }
}
Source: Introduction to NOSQL Databases, SlidePlayer
Database Management Systems Partha Pratim Das 59.33
NoSQL Databases: Column Stores PPD

Module 59

Partha Pratim
Das

Objectives &
Outline

What is Big
Data?

What is NOSQL?
The Perfect Storm

CAP Theorem
Consistency

Types of NOSQL
Databases
Key-value Stores
Document Stores
Column Stores
Graph Stores

Relational vs.
Non-Relational

Module Summary

Source: Introduction to NOSQL Databases, SlidePlayer

Database Management Systems Partha Pratim Das 59.34
NoSQL Databases: Graph Stores PPD

Module 59 • Focus on modeling the structure of data (interconnectivity)

Partha Pratim • Scales to the complexity of data
Das
• Inspired by mathematical Graph Theory (G=(E,V))
Objectives &
Outline • Data model:
What is Big
Data?
◦ (Property Graph) nodes and edges
What is NOSQL?
. Nodes may have properties (including ID)
The Perfect Storm
. Edges may have labels or roles
CAP Theorem ◦ Key-value pairs on both
Consistency
• Interfaces and query languages vary
Types of NOSQL
Databases • Single-step vs path expressions vs full recursion
Key-value Stores
Document Stores • Example:
Column Stores
Graph Stores ◦ Neo4j, FlockDB, Pregel, InfoGrid
Relational vs.
Non-Relational Source: Introduction to NOSQL Databases, SlidePlayer

Module Summary

Database Management Systems Partha Pratim Das 59.35

NOSQL Database Vendors PPD

Module 59

Partha Pratim
Das

Objectives &
Outline

What is Big
Data?

What is NOSQL?
The Perfect Storm

CAP Theorem
Consistency

Types of NOSQL
Databases
Key-value Stores
Document Stores
Column Stores
Graph Stores

Relational vs.
Non-Relational

Module Summary

Source: Database Software Market: The Long-Awaited Shake-up by William Blair, 2019

Database Management Systems Partha Pratim Das 59.36

Relational vs. Non-Relational PPD

Module 59

Partha Pratim
Das

Objectives &
Outline

What is Big
Data?

What is NOSQL?
The Perfect Storm

CAP Theorem
Consistency

Types of NOSQL
Databases
Key-value Stores
Document Stores
Relational vs. Non-Relational
Column Stores
Graph Stores

Relational vs.
Non-Relational

Module Summary

Database Management Systems Partha Pratim Das 59.37

Relational vs. Non-Relational PPD

Module 59

Partha Pratim
Das

Objectives &
Outline

What is Big
Data?

What is NOSQL?
The Perfect Storm

CAP Theorem
Consistency

Types of NOSQL
Databases
Key-value Stores
Document Stores
Column Stores
Graph Stores

Relational vs.
Non-Relational

Module Summary

Database Management Systems Partha Pratim Das 59.38

Database Types and Usecases PPD

Module 59

Partha Pratim
Das

Objectives &
Outline

What is Big
Data?

What is NOSQL?
The Perfect Storm

CAP Theorem
Consistency

Types of NOSQL
Databases
Key-value Stores
Document Stores
Column Stores
Graph Stores

Relational vs.
Non-Relational

Module Summary

Source: Database Software Market: The Long-Awaited Shake-up by William Blair, 2019
Database Management Systems Partha Pratim Das 59.39
Database Market Competitive Landscape PPD

Module 59

Partha Pratim
Das

Objectives &
Outline

What is Big
Data?

What is NOSQL?
The Perfect Storm

CAP Theorem
Consistency

Types of NOSQL
Databases
Key-value Stores
Document Stores
Column Stores
Graph Stores

Relational vs.
Non-Relational

Module Summary

Widely used
RDBMS
Market Share
Ranking
Commercial
Free
ORD
Comparative Study Widely used RDBMS
Course Recap
Week 01
Ref: [Link]
Week 02
Ref: [Link]
Week 03 Ref: [Link]
Week 04 Ref: [Link] us/azure/sql- database/sql- database- develop- cplusplus- simple(Accessed:26-08-2021)
Week 05
Week 06
Week 07
Week 08
Week 09
Week 10
Week 11
Week 12
Database Management Systems Partha Pratim Das 60.5
Relational Databases PPD

Module 60

Partha Pratim • The relational model of data organizes data into one or more tables (or relations) of
Das
rows and columns, with a unique key for each row
Obj. & Outl.
• Since each row in a table has its own unique key, rows in a table can be linked to rows
Widely used
RDBMS in other tables by storing the unique key of the row to which it should be linked (where
Market Share
Ranking
such unique key is known as a foreign key)
Commercial
Free
• Mostly, the relational databases use SQL as the language for querying and maintaining
ORD
Comparative Study
the database
Course Recap • The reasons for the dominance of relational databases are:
Week 01
Week 02
◦ simplicity,
Week 03 ◦ robustness,
Week 04 ◦ flexibility,
Week 05
◦ performance,
Week 06
Week 07
◦ scalability, and
Week 08 ◦ compatibility in managing generic data
Week 09
Week 10 • The RDBMSs are mostly used in large enterprise scenarios
Week 11
Week 12
Database Management Systems Partha Pratim Das 60.6
Widely used RDBMS PPD

Module 60 • Commercial / Proprietary with Market Share1

Partha Pratim ◦ Oracle (Oracle): Market Share of Oracle: 48.8%
Das
◦ Db2 (IBM): Market Share of IBM: 20.2%
Obj. & Outl. ◦ SQL Server (Microsoft): Market Share of Microsoft: 17.0%
Widely used
RDBMS
◦ Sybase (Sybase Corporation / SAP AG): Market Share of SAP: 4.7%
Market Share ◦ Teradata (Caltech and Citibank): Market Share of Teradata: 3.7%
Ranking
Commercial
◦ Others: Microsoft Access, Microsoft Azure SQL Database
Free • Free / GPL2 / Open Source
ORD
Comparative Study ◦ PostgreSQL (PostgreSQL Global Development Group)
Course Recap ◦ MySQL (MySQL AB / Oracle Corporation)
Week 01
Week 02
◦ SQLite (SQLite Developers)
Week 03 ◦ Others: MariaDB, Hive
Week 04
Week 05 • Object–Relational Database (ORD) or Object–RDBMS (ORDBMS)
Week 06
Week 07 ◦ Illustra (Informix / IBM)
Week 08
Week 09
◦ Objectivity/DB (Objectivity, Inc.)
Week 10
1 Gartner, in 2011, listed the five leading proprietary software relational database vendors by revenue
Week 11 2 GNU General Public License (GPLv3)
Week 12
Database Management Systems Partha Pratim Das 60.7
Global DBMS Software Market Share (%): 2021 PPD

Module 60

Partha Pratim
Das
Company Name DBMS Market Share
Obj. & Outl.
Oracle 45.60 %
Widely used
Microsoft 19.10 %
RDBMS
Market Share
IBM 15.70 %
Ranking SAP 9.60 %
Commercial
Free Teradata 3.20 %
ORD
Comparative Study
Others 6.80 %
Course Recap
Week 01
Week 02 Source: DBMS Customers List (Accessed 28-Aug-21)
Week 03
Week 04
Week 05
Week 06
Week 07
Week 08
Week 09
Week 10
Week 11
Week 12
Database Management Systems Partha Pratim Das 60.8
DB-Engines Ranking (August 2021): Relational DBMS PPD

Module 60

Partha Pratim
Das

Obj. & Outl.

Widely used
RDBMS
Market Share
Ranking
Commercial
Free
ORD
Comparative Study

Course Recap
Week 01
Week 02
Week 03
Week 04
Week 05
Week 06
Week 07
Week 08
Week 09
Week 10
Week 11
Source: DB-Engines Ranking of Relational DBMS (Accessed 28-Aug-21)
Week 12
Database Management Systems Partha Pratim Das 60.9
DB-Engines Ranking (August 2021):
Trend of Relational DBMS Popularity PPD

Module 60

Partha Pratim
Das

Obj. & Outl.

Widely used
RDBMS
Market Share
Ranking
Commercial
Free
ORD
Comparative Study

Course Recap
Week 01
Week 02
Week 03
Week 04
Week 05
Week 06
Week 07
Week 08
Week 09
Week 10
Week 11 Source: DB-Engines Ranking - Trend of Relational DBMS Popularity (Accessed 28-Aug-21)
Week 12
Database Management Systems Partha Pratim Das 60.10
DB-Engines Ranking (August 2021): Complete PPD

Module 60

Partha Pratim
Das

Obj. & Outl.

Widely used
RDBMS
Market Share
Ranking
Commercial
Free
ORD
Comparative Study

Course Recap
Week 01
Week 02
Week 03
Week 04
Week 05
Week 06
Week 07
Week 08
Week 09
Week 10
Week 11 Source: DB-Engines Ranking (Accessed 28-Aug-21)
Week 12
Database Management Systems Partha Pratim Das 60.11
DB-Engines Ranking (August 2021): Trend Popularity PPD

Module 60

Partha Pratim
Das

Obj. & Outl.

Widely used
RDBMS
Market Share
Ranking
Commercial
Free
ORD
Comparative Study

Course Recap
Week 01
Week 02
Week 03
Week 04
Week 05
Week 06
Week 07
Week 08
Week 09
Week 10
Week 11 Source: DB-Engines Ranking - Trend Popularity (Accessed 28-Aug-21)
Week 12
Database Management Systems Partha Pratim Das 60.12
Oracle PPD

Module 60 • Multi-model commercial DBMS produced and marketed by Oracle Corporation.

Partha Pratim
Das
• Larry Ellison, Bob Miner and Ed Oates started a consultancy called Software
Development Laboratories (SDL) in 1977, and developed the original version of Oracle.
Obj. & Outl.

Widely used • Latest Version: Oracle Database 19c is the current long term release. Oracle
RDBMS
Market Share
Database 21c is available for production use as an innovation release (August 2021)
Ranking
Commercial
• Application Domains: Online Transaction Processing (OLTP), Data Warehousing (DW)
Free and Mixed (OLTP & DW) database workloads including Oracle Human Capital
ORD
Comparative Study Management (HCM), Oracle Enterprise Resource Planning (ERP), Oracle Customer
Course Recap Experience (CX), Oracle Supply Chain Management (SCM), Oracle Enterprise
Week 01
Week 02
Performance Management (EPM), Oracle Construction and Engineering
Week 03
Week 04
• Languages: Structured Query language (SQL), Procedural SQL (PL-SQL)
Week 05
Week 06
• Tools / Editions: Oracle SQL Developer, Oracle Forms, Oracle Jdeveloper, Oracle
Week 07
Week 08
Reports for development of applications, Oracle Live SQL for test environment
Week 09
Week 10
• Connectivity: Java (JDBC), [Link] ([Link]), C/C++ (OCI, ODBC,
Week 11 ODPI-C), Python (cx Oracle)
Week 12
Database Management Systems Partha Pratim Das 60.13
Db2 PPD

Module 60

Partha Pratim • Db2 contains database-server products developed by IBM. Mostly relational models,
Das
but now includes object relational models
Obj. & Outl.
• In 1970, Edgar [Link], researcher in IBM published the model for data manipulation.
Widely used
RDBMS
Market Share
• Latest Version: Db2 11.5 (June 2019)
Ranking
Commercial
• Application Domains: Online Transaction Processing (OLTP), Data Warehousing (DW)
Free and Mixed (OLTP & DW) database workloads
ORD
Comparative Study • Languages: Structured Query language (SQL), XML Query
Course Recap
Week 01
• Tools / Editions: Advanced Enterprise Server Edition, Enterprise Server Edition,
Week 02
Week 03
Advanced Workgroup Server Edition, Workgroup Server Edition, Direct and Developer
Week 04 Editions and Express-C.
Week 05
Week 06 • Connectivity: C/C++, Java, Ruby, Perl through a package of DB2 API’s
Week 07
Week 08
Week 09
Week 10
Week 11
Week 12
Database Management Systems Partha Pratim Das 60.14
SQL Server PPD

Module 60

Partha Pratim • Relational database management system developed by Microsoft.

Das
• SQL Server 1.0, a 16-bit server for the OS/2 operating system in 1989
Obj. & Outl.

Widely used
• Latest Version: Microsoft SQL Server 2019 (November 2019)
RDBMS
Market Share • Application Domains: Online Transaction Processing (OLTP) and Online Analytical
Ranking
Commercial
Processing (OLAP)
Free
ORD
• Languages: Transact SQL
Comparative Study
• Tools / Editions: Enterprise, Standard, Web, Business Intelligence, WorkGroup, Express
Course Recap
Week 01 • Connectivity: Java (JDBC), C/C++ (ODBC)
Week 02
Week 03
Week 04
Week 05
Week 06
Week 07
Week 08
Week 09
Week 10
Week 11
Week 12
Database Management Systems Partha Pratim Das 60.15
Sybase PPD

Module 60

Partha Pratim • Relational model database server product for businesses developed by Sybase
Das
Corporation which became part of SAP AG.
Obj. & Outl.
• Originally meant for Unix platforms in 1987, Sybase Corporation’s primary DBMS
Widely used
RDBMS product was initially marketed under the name Sybase SQL Server.
Market Share
Ranking • Latest Version: SAP ASE 16 (April 2014)
Commercial
Free • Languages: Sybase IQ, Transact-SQL
ORD
Comparative Study • Tools / Editions: Sybase SQL server for development of applications. Has a developer
Course Recap and express edition.
Week 01
Week 02
Week 03
• Connectivity: C/C++ (SQLAPI++), Java (JDBC)
Week 04
Week 05
Week 06
Week 07
Week 08
Week 09
Week 10
Week 11
Week 12
Database Management Systems Partha Pratim Das 60.16
Teradata PPD

Module 60

Partha Pratim • Relational database management system developed by Caltech and Citibank’s
Das
advanced technology group
Obj. & Outl.
• In 1984, the first version of Teradata was released
Widely used
RDBMS
Market Share
• Latest Version: Teradata [Link] (August 2021)
Ranking
Commercial
• Application Domains: Online Transaction Processing (OLTP), Data Warehousing (DW)
Free and Mixed (OLTP & DW) database workloads
ORD
Comparative Study • Languages: BTEQ (Basic Teradata Query)
Course Recap
Week 01
• Tools / Editions: Developer Edition, Express Edition
Week 02
Week 03 • Connectivity: Java (JDBC), C/C++ (ODBC)
Week 04
Week 05
Week 06
Week 07
Week 08
Week 09
Week 10
Week 11
Week 12
Database Management Systems Partha Pratim Das 60.17
PostgreSQL PPD

Module 60

Week 01 ◦ Aggregation Operators ◦ Set Operations
Week 02
Week 03 • Module 08: Introduction to SQL/1 ◦ Null Values
Week 04 ◦ Aggregate Functions
Week 05 ◦ History of SQL . Group By
Week 06 ◦ Data Definition Language . Having
Week 07
Week 08
◦ Data Manipulation Language . Null Values
Week 09
Week 10
Week 11
Week 12
Database Management Systems Partha Pratim Das 60.35
Week 03: Intermediate and Advanced SQL PPD

Module 60
• Module 11: SQL Examples • Module 13: Intermediate SQL/2
Partha Pratim
Das ◦ Cartesian Product ◦ Join Expressions
Obj. & Outl.
◦ Rename AS ◦ Views
◦ Where AND/OR • Module 14: Intermediate SQL/3
Widely used
RDBMS
◦ String Values
Market Share ◦ Order By Clause ◦ Transactions
Ranking ◦ in ◦ Integrity Constraints
Commercial
◦ Set Operations ◦ SQL Data Types and Schemas
Free
ORD ◦ Aggregation Operations ◦ Authorization
Comparative Study
• Module 12: Intermediate SQL/1 • Module 15: Advanced SQL
Course Recap
Week 01 ◦ Nested Subqueries ◦ Functions and Procedural Constructs
Week 02 ◦ Modification of the Database ◦ Triggers
Week 03
Week 04
Week 05
Week 06
Week 07
Week 08
Week 09
Week 10
Week 11
Week 12
Database Management Systems Partha Pratim Das 60.36
Week 04: Relational Query and Modelling PPD

Module 60
• Module 16: Formal Relational Query • Module 19: Entity-Relationship Model/2
Partha Pratim
Das Languages/1 ◦ ER Diagram
Obj. & Outl.
◦ Relational Algebra ◦ ER Model to Relational Schema
Widely used • Module 17: Formal Relational Query • Module 20: Entity-Relationship Model/3
RDBMS Languages/2
Market Share
◦ ER Features
Ranking ◦ Predicate Logic
Commercial
Free
◦ Tuple Relational Calculus
ORD
◦ Domain Relational Calculus
Comparative Study ◦ Equivalence of Algebra and Calculus
Course Recap
Week 01
• Module 18: Entity-Relationship Model/1
Week 02 ◦ Design Process
Week 03
Week 04
◦ ER Model
Week 05
Week 06
Week 07
Week 08
Week 09
Week 10
Week 11
Week 12
Database Management Systems Partha Pratim Das 60.37
Week 05: RDBMS Design: Dependency and Normal Forms PPD

Module 60
• Module 21: Relational Database Design/1 • Module 24: Relational Database Design/4
Partha Pratim
Das ◦ Features of Good Relational Design ◦ Algorithms for Functional Dependencies
Obj. & Outl.
◦ Atomic Domains and First Normal Form • Module 25: Relational Database Design/5
Widely used • Module 22: Relational Database Design/2 ◦ Lossless Join Decomposition
RDBMS
Market Share
◦ Functional Dependencies ◦ Dependency Preservation
Ranking
Commercial
• Module 23: Relational Database Design/3
Free
ORD
◦ Functional Dependency Theory
Comparative Study
◦ Decomposition Using Functional
Course Recap
Dependencies
Week 01
Week 02
Week 03
Week 04
Week 05
Week 06
Week 07
Week 08
Week 09
Week 10
Week 11
Week 12
Database Management Systems Partha Pratim Das 60.38
Week 06: RDBMS Design: Dependency and Normal Forms (2) PPD

Module 60
• Module 26: Relational Database Design/6: • Module 29: Relational Database Design/9:
Partha Pratim
Das Normal Forms MVD and 4NF

Obj. & Outl.

◦ Normal Forms ◦ Multivalued Dependencies
◦ Decomposition to 4NF
Widely used • Module 27: Relational Database Design/7:
RDBMS
Normal Forms • Module 30: Relational Database Design/10:
Market Share
Design Summary and Temporal Data
Ranking ◦ Decomposition to 3NF
Commercial
◦ Decomposition to BCNF ◦ Database-Design Process
Free
◦ Temporal Databases
ORD
Comparative Study
• Module 28: Relational Database Design/8:
Case Study
Course Recap
Week 01 ◦ Library Information System (LIS)
Week 02
(Specification of LIS shared separately)
Week 03
Week 04
Week 05
Week 06
Week 07
Week 08
Week 09
Week 10
Week 11
Week 12
Database Management Systems Partha Pratim Das 60.39
Week 07: Application Development PPD

Module 60
• Module 31: Application Design and • Module 34: Application Design and
Partha Pratim Development/4: Python and PostgreSQL
Das Development/1: Architecture
◦ Application Programs and Architecture ◦ PostgreSQL and Python
Obj. & Outl.
◦ Python Frameworks for PostgresSQL
Widely used • Module 32: Application Design and ◦ Flask
RDBMS
Development/2: Web Applications
Market Share
• Module 35: Application Design and
Ranking ◦ WWW Development/5: Application Development
Commercial
Free
◦ Scripting and Mobile
ORD
Comparative Study
• Module 33: Application Design and ◦ Rapid Application Development
Development/3: SQL and Native Language ◦ Application Performance and Security
Course Recap
Week 01 ◦ SQL and Native Language ◦ Challenges in Web Application
Week 02
◦ ODBC Development
Week 03
Week 04
◦ JDBC ◦ Mobile Apps
Week 05 ◦ Bridge
Week 06 ◦ Embedded SQL
Week 07
Week 08
Week 09
Week 10
Week 11
Week 12
Database Management Systems Partha Pratim Das 60.40
Week 08: Storage Management PPD

Module 60
• Module 36: Algorithms and Data • Module 39: Storage and File Structure/1:
Partha Pratim
Das Structures/1: Algorithms and Complexity Physical Storage
Analysis ◦ Overview of Physical Storage Media
Obj. & Outl.
◦ Algorithms ◦ Magnetic Disk
Widely used
RDBMS ◦ Analysis of Algorithms ◦ Magnetic Tapes
Market Share ◦ Complexity Chart ◦ Cloud Storage
Ranking
• Module 37: Algorithms and Data ◦ Other Storage
Commercial
Free Structures/2: Data Structures/1
◦ Future of Storage
ORD
• Module 40: Storage and File Structure/2:
Comparative Study ◦ Data Structures File Structure
Course Recap ◦ Linear Data Structures
Week 01 ◦ Linear and Binary Search ◦ File Organization
Week 02
• Module 38: Algorithms and Data ◦ Organization of Records in Files
Week 03
◦ Data Dictionary Storage
Week 04 Structures/3: Data Structures/2
Week 05 ◦ Storage Access
Week 06 ◦ Data Structures
Week 07
Week 08
◦ Non-linear Data Structures
Week 09
◦ Binary Search Tree
Week 10 ◦ Comparison
Week 11
Week 12
Database Management Systems Partha Pratim Das 60.41
Week 09: Indexing and Hashing PPD

Module 60
• Module 41: Indexing and Hashing/1: • Module 44: Indexing and Hashing/4: Hashing
Partha Pratim
Das Indexing/1 ◦ Static Hashing
Obj. & Outl.
◦ Concepts of Indexing ◦ Dynamic Hashing
◦ Ordered Indices ◦ Comparison Schemes
Widely used
◦ Bitmap Indices
RDBMS
• Module 42: Indexing and Hashing/1:
Market Share
Ranking
Indexing/2 • Module 45: Indexing and Hashing/5: Index
Design
Commercial
◦ Balanced Binary Search Trees
Free
ORD ◦ 2-3-4 Tree ◦ Index Definition in SQL
Comparative Study ◦ Guidelines for Indexing
• Module 43: Indexing and Hashing/1:
Course Recap
Week 01
Indexing/3
Week 02
◦ B+ -Tree Index Files
Week 03
Week 04
◦ B-Tree Index Files
Week 05
Week 06
Week 07
Week 08
Week 09
Week 10
Week 11
Week 12
Database Management Systems Partha Pratim Das 60.42
Week 10: Transactions Management PPD

Module 60
• Module 46: Transactions/1 • Module 49: Concurrency Control/1
Partha Pratim
Das ◦ Transaction Concept ◦ Concurrency Control
Obj. & Outl.
◦ Transaction States ◦ Lock-Based Protocols
◦ Concurrent Executions ◦ Implementation of Locking
Widely used
RDBMS • Module 47: Transactions/2: Serializability • Module 50: Concurrency Control/2
Market Share
Ranking ◦ Serializability ◦ Deadlock Handling
Commercial
◦ Conflict Serializability ◦ Timestamp-Based Protocols
Free
ORD • Module 48: Transactions/3: Recoverability
Comparative Study

Course Recap
◦ Recovery
Week 01 ◦ Transaction Definition in SQL
Week 02 ◦ View Serializability
Week 03
Week 04
◦ Complex Notions of Serializability
Week 05
Week 06
Week 07
Week 08
Week 09
Week 10
Week 11
Week 12
Database Management Systems Partha Pratim Das 60.43
Week 11: Backup and Recovery PPD

Module 60
• Module 51: Backup and Recovery/1: • Module 53: Backup and Recovery/3:
Partha Pratim
Das
Backup/1 Recovery/2

Obj. & Outl.

◦ What is Backup and Recovery? ◦ Transactional Logging
◦ Why Backup? ◦ Recovery Algorithm
Widely used
◦ Backup Data: Types
RDBMS
Market Share ◦ Backup Strategies • Module 54: Backup and Recovery/4:
Ranking ◦ Case: Monthly Schedule Recovery/3
Commercial
Free
◦ Hot Backup ◦ Recovery with Early Lock Release
ORD
• Module 52: Backup and Recovery/2: ◦ Plan for Backup and Recovery
Comparative Study
Recovery/1 • Module 55: Backup and Recovery/5:
Course Recap
◦ Failure Classification Backup/2: RAID
Week 01
Week 02 ◦ Storage Structure ◦ RAID: Redundant Array of Independent
Week 03
Week 04
◦ Log-Based Recovery Disks
Week 05
Week 06
Week 07
Week 08
Week 09
Week 10
Week 11
Week 12
Database Management Systems Partha Pratim Das 60.44
Week 12: Query Optimization, Performance and Architecture,
NOSQL, Widely used RDBMSs, and Course Summarization PPD

Module 60
• Module 56: Query Processing and • Module 59: Non-Relational DBMS:
Partha Pratim NOSQL
Das
Optimization/1: Processing
◦ Query Processing ◦ What is Big Data?
Obj. & Outl.
◦ Query Cost ◦ What is NOSQL?
Widely used
RDBMS
◦ Selection Operation ◦ CAP Theorem
Market Share ◦ Sorting ◦ Types of NOSQL Databases
Ranking ◦ Join Operation ◦ Relational vs. Non-Relational
Commercial
Free
◦ Other Operations • Module 60: Widely used DBMSs and
ORD
Comparative Study
• Module 57: Query Processing and Summarization
Optimization/2: Optimization ◦ Widely used RDBMSs
Course Recap
Week 01 ◦ Introduction to Query Optimization ◦ Course Recap
Week 02 ◦ Transformation of Relational Expressions
Week 03
Week 04 • Module 58: RDBMS Performance and
Week 05
Architecture
Week 06
Week 07 ◦ RDBMS Performance and Scalability
Week 08
Week 09
◦ RDBMS Architecture
Week 10 ◦ Scaling Databases
Week 11
Week 12
Database Management Systems Partha Pratim Das 60.45
Final Words PPD

Module 60

Partha Pratim • Read the DBMS Text book thoroughly and solve exercises
Das
• Practice query coding
Obj. & Outl.

Widely used
• Practice database design from specs
RDBMS
Market Share • Besides DBMS, develop good knowledge in programming, data structure, algorithms
Ranking
Commercial
and discrete structures
Free
ORD
• Seek help, if you need to – mail us
Comparative Study
• To learn more online you may refer to the resources mentioned in: What is the best
Course Recap
Week 01
possible way to learn DBMS online ?
Week 02
Week 03
Week 04
Week 05
Week 06
Week 07
Slides used in this presentation are borrowed from [Link] with kind
Week 08 permission of the authors.
Week 09
Week 10 Edited and new slides are marked with “PPD”.
Week 11
Week 12
Database Management Systems Partha Pratim Das 60.46

Database Security Design
No ratings yet
Database Security Design
39 pages
UNIT 1 - Database Security
No ratings yet
UNIT 1 - Database Security
136 pages
Database Security and Auditing Guide
No ratings yet
Database Security and Auditing Guide
84 pages
Database Security and Privacy UNIT - I - PPT
No ratings yet
Database Security and Privacy UNIT - I - PPT
61 pages
of Chapter 2.2 - Database Security
No ratings yet
of Chapter 2.2 - Database Security
30 pages
Parallelism in Database Management Systems
No ratings yet
Parallelism in Database Management Systems
37 pages
Database Security Lab Course Plan
No ratings yet
Database Security Lab Course Plan
2 pages
SQL Injection Lab
No ratings yet
SQL Injection Lab
11 pages
Understanding Comprehensive Database Security: Technical White Paper
No ratings yet
Understanding Comprehensive Database Security: Technical White Paper
36 pages
KSU Database Security Overview
No ratings yet
KSU Database Security Overview
78 pages
Introduction To Dbms
No ratings yet
Introduction To Dbms
107 pages
Database Security and Privacy Overview
No ratings yet
Database Security and Privacy Overview
68 pages
SQL Developer Roadmap Overview
No ratings yet
SQL Developer Roadmap Overview
96 pages
Database Security Essentials
No ratings yet
Database Security Essentials
123 pages
Dbsecurity
No ratings yet
Dbsecurity
79 pages
Database Security in DBMS... 22
No ratings yet
Database Security in DBMS... 22
15 pages
Unit III SQL (RDBMS)
No ratings yet
Unit III SQL (RDBMS)
33 pages
Oracle Database Security: Presented by Wilson Crider
No ratings yet
Oracle Database Security: Presented by Wilson Crider
94 pages
Database Security Overview and Concepts
No ratings yet
Database Security Overview and Concepts
26 pages
Chapter 2 Database Security and Authorization
No ratings yet
Chapter 2 Database Security and Authorization
128 pages
Chapter-3 - Database Security
No ratings yet
Chapter-3 - Database Security
8 pages
Lab4 Oracle Database Security
No ratings yet
Lab4 Oracle Database Security
16 pages
Introduction To Distributed Database Presentation
100% (1)
Introduction To Distributed Database Presentation
67 pages
Information Security Management Overview
No ratings yet
Information Security Management Overview
4 pages
Chapter 23 Database Security
No ratings yet
Chapter 23 Database Security
11 pages
Database Security Lecture
No ratings yet
Database Security Lecture
45 pages
BIT 316 Database Administration - Mmust
No ratings yet
BIT 316 Database Administration - Mmust
22 pages
Database Security
No ratings yet
Database Security
26 pages
20764C 03-1 PDF
No ratings yet
20764C 03-1 PDF
23 pages
Lab - Manual Dbe 2023-24 Final
No ratings yet
Lab - Manual Dbe 2023-24 Final
64 pages
Lab01
No ratings yet
Lab01
5 pages
Oracle Database User Management
No ratings yet
Oracle Database User Management
31 pages
5 Secure Databases
No ratings yet
5 Secure Databases
7 pages
Firewall and Trusted System Overview
No ratings yet
Firewall and Trusted System Overview
18 pages
Chapter 4 - Protection in General-Purpose OS
No ratings yet
Chapter 4 - Protection in General-Purpose OS
51 pages
SQL Server DBA Online Training Guide
No ratings yet
SQL Server DBA Online Training Guide
14 pages
Bell LaPadula Model
No ratings yet
Bell LaPadula Model
8 pages
Lecture 4 - Database Security
No ratings yet
Lecture 4 - Database Security
18 pages
Database Security
No ratings yet
Database Security
19 pages
RDBMS - M01 - C01 - PPT - Overview of Database Management System
No ratings yet
RDBMS - M01 - C01 - PPT - Overview of Database Management System
47 pages
SQL Injection Defense for Sybase
No ratings yet
SQL Injection Defense for Sybase
16 pages
SQL Tutorial for Beginners and Pros
No ratings yet
SQL Tutorial for Beginners and Pros
46 pages
DMS Unit1
No ratings yet
DMS Unit1
44 pages
Relational Database Management System
No ratings yet
Relational Database Management System
5 pages
Padallan J. Database Security. Protecting Against Internal and External... 2025
No ratings yet
Padallan J. Database Security. Protecting Against Internal and External... 2025
244 pages
DBMSHBM
No ratings yet
DBMSHBM
80 pages
Lecture 1 DBA
100% (1)
Lecture 1 DBA
20 pages
Database Security Fundamentals Explained
100% (1)
Database Security Fundamentals Explained
38 pages
SQL Server Installation Guide
No ratings yet
SQL Server Installation Guide
30 pages
SQL Server Automation (Maintenance Plan)
100% (1)
SQL Server Automation (Maintenance Plan)
18 pages
Unit 38 DatabaseManagementSyst
No ratings yet
Unit 38 DatabaseManagementSyst
27 pages
SQL Server Security Best Practices - Database Management - Blogs - Quest Community
No ratings yet
SQL Server Security Best Practices - Database Management - Blogs - Quest Community
9 pages
Database Management System
No ratings yet
Database Management System
76 pages
Intro To SQL 2022 Edition by University of Jordan
No ratings yet
Intro To SQL 2022 Edition by University of Jordan
98 pages
Database Security Threats and Controls
No ratings yet
Database Security Threats and Controls
38 pages
Database Security Auditing Techniques
100% (1)
Database Security Auditing Techniques
46 pages
Database Management Systems Overview
No ratings yet
Database Management Systems Overview
511 pages
Week 1
No ratings yet
Week 1
112 pages
Data Base Management System: Prof. Partha Pratim Das
No ratings yet
Data Base Management System: Prof. Partha Pratim Das
933 pages
Week 1 Lecture Material
No ratings yet
Week 1 Lecture Material
115 pages
Basic Avsec Module Wise Question Papers
No ratings yet
Basic Avsec Module Wise Question Papers
3 pages
Afternoon MLF Mad1
No ratings yet
Afternoon MLF Mad1
2 pages
Week 12
No ratings yet
Week 12
11 pages
Java Jan23 Jan25
No ratings yet
Java Jan23 Jan25
291 pages
Shreyash Gadgil Resume (May-2025) PDF
No ratings yet
Shreyash Gadgil Resume (May-2025) PDF
3 pages
Week 11.3
No ratings yet
Week 11.3
14 pages
Digital Signal Controller TMS320F28335: Modul 2: Arhitektura
No ratings yet
Digital Signal Controller TMS320F28335: Modul 2: Arhitektura
16 pages
The New Way of The Cross
No ratings yet
The New Way of The Cross
9 pages
PT English-5 Q2
No ratings yet
PT English-5 Q2
8 pages
AI Techniques: ImageNet, WaveNet, Word2Vec
No ratings yet
AI Techniques: ImageNet, WaveNet, Word2Vec
21 pages
PHD in Islam DarkTheme
No ratings yet
PHD in Islam DarkTheme
330 pages
Merge (1) PBL
No ratings yet
Merge (1) PBL
17 pages
Python MySQL Database Connection Guide
No ratings yet
Python MySQL Database Connection Guide
18 pages
Hakin9 - Hacker's Toolset For 2022 Hide01.Ir
No ratings yet
Hakin9 - Hacker's Toolset For 2022 Hide01.Ir
109 pages
Understanding HQL and Joins
No ratings yet
Understanding HQL and Joins
22 pages
Psidium Guajava: Myrtaceae L
No ratings yet
Psidium Guajava: Myrtaceae L
5 pages
Egypt Exploration Society The Journal of Egyptian Archaeology
No ratings yet
Egypt Exploration Society The Journal of Egyptian Archaeology
10 pages
SailPoint IdentityIQ Training
No ratings yet
SailPoint IdentityIQ Training
7 pages
Engineering Drawing - Group - 20 (1st Year) - ELECTRONIC MECHANIC
No ratings yet
Engineering Drawing - Group - 20 (1st Year) - ELECTRONIC MECHANIC
70 pages
Differentiations (Increasing and Decreasing Functions)
No ratings yet
Differentiations (Increasing and Decreasing Functions)
3 pages
Marvellous Infosystems Machine Learning - Logistic Regression
No ratings yet
Marvellous Infosystems Machine Learning - Logistic Regression
3 pages
TCL-TK Quick Guide
100% (1)
TCL-TK Quick Guide
139 pages
MGT 370 Auditing Group Assignment
No ratings yet
MGT 370 Auditing Group Assignment
2 pages
The Quantitative Parameters in Computer-Assisted Approach Authors Lexical Choices in The Novels by Martin Amis
No ratings yet
The Quantitative Parameters in Computer-Assisted Approach Authors Lexical Choices in The Novels by Martin Amis
4 pages
1995 - Exam
No ratings yet
1995 - Exam
12 pages
01 Session 1.a - The Status of Literacy in The Country - ARAL PROGRAMpptx
100% (1)
01 Session 1.a - The Status of Literacy in The Country - ARAL PROGRAMpptx
14 pages
Sap MM Notes
No ratings yet
Sap MM Notes
205 pages
Neural Network Sinusoidal Approximation
No ratings yet
Neural Network Sinusoidal Approximation
3 pages
GR 7 Math - Smart Minds Mathematics Schemes of Work Term 2.
No ratings yet
GR 7 Math - Smart Minds Mathematics Schemes of Work Term 2.
3 pages
Research Report Writing Lesson Plan
100% (1)
Research Report Writing Lesson Plan
6 pages
Birendra Kishore Roy Chowdhury Collection
No ratings yet
Birendra Kishore Roy Chowdhury Collection
5 pages
10731648
No ratings yet
10731648
274 pages
Prayer Imago Dei
No ratings yet
Prayer Imago Dei
12 pages
As at May 2025
No ratings yet
As at May 2025
30 pages
Turkish Cultural Place Names
No ratings yet
Turkish Cultural Place Names
13 pages
2025 Grade 8 November Mock Exam Paper 2 Memorandum
No ratings yet
2025 Grade 8 November Mock Exam Paper 2 Memorandum
12 pages